Builds scalable, production-grade data pipelines and infrastructure using modern tools like Spark, Airflow, and dbt.
This skill empowers Claude with the specialized knowledge required to architect, implement, and optimize complex data systems. It covers the full lifecycle of data engineering, from designing robust ETL/ELT workflows and real-time streaming architectures to implementing advanced dimensional modeling and DataOps practices. Whether you are managing massive datasets with Spark, orchestrating workflows with Airflow, or ensuring data integrity with automated quality frameworks, this skill provides the best practices and implementation patterns necessary for reliable and high-performance data infrastructure.
주요 기능
01Design and implementation of batch and incremental ETL/ELT pipelines.
02Real-time event streaming architectures using Kafka and Spark Streaming.
03Automated data quality validation and monitoring frameworks.
04Performance tuning for SQL queries and large-scale distributed processing jobs.
05Advanced data modeling including Star Schema, Snowflake, and Data Vault.
060 GitHub stars
사용 사례
01Architecting a modern data lakehouse for unified analytics and reporting.
02Migrating legacy data processes to a modern stack with dbt and Snowflake.
03Implementing a comprehensive DataOps strategy with automated testing and observability.