Does it support ETL orchestration?

It includes standardized Step Functions patterns for managing complex Glue and Athena workflows, featuring error handling, retries, and synchronous execution states.

How does this skill help reduce AWS costs?

It enforces strict partitioning and columnar formats like Parquet to minimize data scanned by Athena ($5/TB) and provides guidance on right-sizing compute DPUs for Glue jobs.

What coding languages are covered for ETL?

The skill specifically focuses on PySpark for AWS Glue ETL jobs and Python for event-driven AWS Lambda data processing.

Can it help with Redshift performance?

Yes, it provides specific patterns for DISTKEY and SORTKEY selection to optimize joins and query pruning, along with COPY command best practices for parallel loading.

AWS Data Engineering Patterns

Name: AWS Data Engineering Patterns
Author: AlexanderStephenThompson

byAlexanderStephenThompson

0•

Infraestructura en la Nube

Implements cost-effective and high-performance AWS data architectures using Glue, S3, Athena, and Redshift best practices.

This skill provides specialized guidance for architecting data solutions on AWS, focusing on cost-optimization and performance. It helps Claude generate efficient S3 data lake structures, optimized Glue ETL jobs, and high-performance Redshift schemas while emphasizing crucial patterns like Hive-style partitioning, columnar storage, and event-driven processing. By following these battle-tested patterns, developers can avoid common pitfalls such as expensive full table scans in Athena or inefficient compute usage in Glue and Lambda, ensuring data pipelines are both scalable and economical.

Características Principales

010 GitHub stars

02Athena Query cost control and Partition Projection patterns

03S3 Data Lake path structuring and Hive-style partitioning

04Step Functions orchestration for reliable ETL workflows

05Glue PySpark job templates with job bookmarks and predicate push-down

06Redshift table design with DISTKEY and SORTKEY optimization

Casos de Uso

01Developing production-grade ETL pipelines with AWS Glue and Step Functions

02Building a scalable, cost-effective data lake on S3 and Athena

03Optimizing Redshift warehouse performance for complex analytical queries

Características Principales

010 GitHub stars

02Athena Query cost control and Partition Projection patterns

03S3 Data Lake path structuring and Hive-style partitioning

04Step Functions orchestration for reliable ETL workflows

05Glue PySpark job templates with job bookmarks and predicate push-down

06Redshift table design with DISTKEY and SORTKEY optimization

Casos de Uso

01Developing production-grade ETL pipelines with AWS Glue and Step Functions

02Building a scalable, cost-effective data lake on S3 and Athena

03Optimizing Redshift warehouse performance for complex analytical queries