Builds and optimizes scalable data pipelines, modern data warehouses, and real-time streaming architectures using industry-standard tools.
This skill transforms Claude into a senior data engineer capable of designing, implementing, and optimizing complex data ecosystems. It covers everything from batch and streaming pipelines to data lakehouse architectures and cloud-native platforms like Snowflake, Databricks, and BigQuery. Use it to implement robust ETL/ELT workflows with dbt and Airflow, ensure data quality with Great Expectations, and manage large-scale data infrastructure using best practices in governance, security, and performance tuning.
Key Features
01Modern data warehouse and lakehouse implementation on AWS, Azure, and GCP
02Workflow orchestration and automation using Airflow, Dagster, and Prefect
0331,721 GitHub stars
04Advanced data modeling including dimensional, Data Vault, and OBT patterns
05Integrated data quality frameworks, lineage tracking, and governance
06End-to-end batch and streaming pipeline design using Spark, Flink, and Kafka
Use Cases
01Building a scalable dbt transformation layer for Snowflake or BigQuery environments
02Implementing automated data quality monitoring to prevent production pipeline failures
03Designing a real-time CDC pipeline to sync production databases with a cloud data warehouse