Implements production-ready Databricks Lakehouse architectures using Unity Catalog and the medallion pattern.
This skill provides a comprehensive framework for designing and deploying Databricks environments following industry best practices. It automates the creation of workspace structures, Unity Catalog governance models, and three-tier (Bronze, Silver, Gold) data processing pipelines. It is particularly useful when establishing new projects, reviewing existing architecture for performance and security, or implementing CI/CD workflows using Databricks Asset Bundles to ensure consistency across development, staging, and production environments.
主要功能
01Production-ready Medallion architecture implementation for structured data refinement
02Declarative CI/CD configuration using Databricks Asset Bundles
03Performance-optimized compute strategies for ETL jobs and SQL warehouses
04Built-in Delta Lake maintenance routines including OPTIMIZE and VACUUM schedules
05Automated Unity Catalog hierarchy setup with environment-isolated catalogs and schemas
061,901 GitHub stars
使用场景
01Modernizing legacy Spark workloads into a Unity Catalog-managed Lakehouse
02Bootstrapping a new Databricks workspace with enterprise-grade governance and security
03Establishing standardized project structures for collaborative data engineering teams