About
This skill streamlines the development of PySpark transformation pipelines within medallion architecture data lakes. By proactively querying DuckDB warehouses and parsing data dictionaries, it ensures that generated code adheres to exact column names, data types, and complex business rules. It bridges the gap between raw data sources and analytical layers by identifying mapping conventions, handling deduplication strategies, and implementing standard ETL patterns, ultimately reducing schema-related errors and manual debugging during data engineering workflows.