What types of data pipelines does this skill support?

The skill provides patterns for Extract-Transform-Load (ETL) workflows, specifically focusing on batch processing, incremental loads, and idempotent operations.

Can I use this for database schema design?

Absolutely. It includes best practices for Star Schema and Snowflake Schema design, including dimension and fact table structures.

Does it include support for Big Data tools?

Yes, it includes specialized configurations and implementation patterns for Apache Spark (PySpark), including adaptive query execution and window functions.

How are data quality checks handled?

It provides a validation framework where you can define SQL-based checks, thresholds, and severity levels to catch issues like data freshness or null values before they reach production.

Data Engineering Patterns

Name: Data Engineering Patterns
Author: micsapp

bymicsapp

0•

データベース管理

Implements professional data engineering patterns for ETL pipelines, Spark processing, and robust data warehouse modeling.

This skill equips Claude with specialized knowledge for architecting and building modern data infrastructure. It provides production-ready patterns for Extract-Transform-Load (ETL) pipelines, high-performance Apache Spark configurations, and comprehensive data quality validation frameworks. Whether you are designing a star schema for a data warehouse or optimizing big data processing jobs, this skill ensures best practices are followed to prevent common anti-patterns like row-by-row processing or missing data lineage.

主な機能

010 GitHub stars

02Optimized Apache Spark patterns for window functions and partitioned writes

03Anti-pattern detection to ensure idempotency and scalable data processing

04Standardized ETL pipeline architecture with batch processing and error handling

05Star and Snowflake schema SQL modeling for efficient data warehousing

06Automated data quality validation for null checks, duplicates, and freshness

ユースケース

01Implementing automated data quality monitoring and alerting systems

02Architecting partitioned data lakes using PySpark and Parquet storage

03Building scalable ETL pipelines to move data from transactional DBs to warehouses

主な機能

010 GitHub stars

02Optimized Apache Spark patterns for window functions and partitioned writes

03Anti-pattern detection to ensure idempotency and scalable data processing

04Standardized ETL pipeline architecture with batch processing and error handling

05Star and Snowflake schema SQL modeling for efficient data warehousing

06Automated data quality validation for null checks, duplicates, and freshness

ユースケース

01Implementing automated data quality monitoring and alerting systems

02Architecting partitioned data lakes using PySpark and Parquet storage

03Building scalable ETL pipelines to move data from transactional DBs to warehouses