Implements professional data engineering patterns for ETL pipelines, Spark processing, and robust data warehouse modeling.
This skill equips Claude with specialized knowledge for architecting and building modern data infrastructure. It provides production-ready patterns for Extract-Transform-Load (ETL) pipelines, high-performance Apache Spark configurations, and comprehensive data quality validation frameworks. Whether you are designing a star schema for a data warehouse or optimizing big data processing jobs, this skill ensures best practices are followed to prevent common anti-patterns like row-by-row processing or missing data lineage.
主な機能
010 GitHub stars
02Optimized Apache Spark patterns for window functions and partitioned writes
03Anti-pattern detection to ensure idempotency and scalable data processing
04Standardized ETL pipeline architecture with batch processing and error handling
05Star and Snowflake schema SQL modeling for efficient data warehousing
06Automated data quality validation for null checks, duplicates, and freshness
ユースケース
01Implementing automated data quality monitoring and alerting systems
02Architecting partitioned data lakes using PySpark and Parquet storage
03Building scalable ETL pipelines to move data from transactional DBs to warehouses