Designs scalable data lake and lakehouse architectures using optimized partitioning, storage tiers, and high-performance formats like Iceberg and Parquet.
This skill empowers Claude to act as a senior data architect, providing prescriptive guidance on organizing large-scale data in cloud storage. It specializes in modern lakehouse patterns, helping users implement three-tier storage layouts (Bronze/Silver/Gold), select optimal partitioning strategies to avoid the 'small file problem,' and choose between raw Parquet and Apache Iceberg for transactional integrity. Whether you are building a new data platform or optimizing an existing one, it offers best practices for schema evolution, storage lifecycle policies, and high-performance data modeling.
主な機能
010 GitHub stars
02Time-based, Multi-dimensional, and Hash partitioning strategies
03Schema design patterns including wide tables and nested structures
04Table format selection between Parquet and Apache Iceberg