Implements cost-effective and high-performance AWS data architectures using Glue, S3, Athena, and Redshift best practices.
This skill provides specialized guidance for architecting data solutions on AWS, focusing on cost-optimization and performance. It helps Claude generate efficient S3 data lake structures, optimized Glue ETL jobs, and high-performance Redshift schemas while emphasizing crucial patterns like Hive-style partitioning, columnar storage, and event-driven processing. By following these battle-tested patterns, developers can avoid common pitfalls such as expensive full table scans in Athena or inefficient compute usage in Glue and Lambda, ensuring data pipelines are both scalable and economical.
Características Principales
010 GitHub stars
02Athena Query cost control and Partition Projection patterns
03S3 Data Lake path structuring and Hive-style partitioning
04Step Functions orchestration for reliable ETL workflows
05Glue PySpark job templates with job bookmarks and predicate push-down
06Redshift table design with DISTKEY and SORTKEY optimization
Casos de Uso
01Developing production-grade ETL pipelines with AWS Glue and Step Functions
02Building a scalable, cost-effective data lake on S3 and Athena
03Optimizing Redshift warehouse performance for complex analytical queries