Executes comprehensive platform migrations from legacy data systems to Databricks with automated schema conversion and validation.
This skill provides a structured framework for migrating complex data workloads from legacy platforms—including Hadoop, Snowflake, Redshift, and Oracle—directly into Databricks. It automates technical hurdles like schema conversion to Delta Lake, manages partitioned data transfers, and facilitates ETL pipeline updates for Unity Catalog compatibility. By providing prioritized wave planning and rigorous validation scripts, it minimizes downtime and prevents data loss during high-stakes enterprise transitions to a Lakehouse architecture.
主な機能
01Prioritized migration planning with wave assignments and metadata inventory
02Automated source schema conversion to Delta Lake compatible types
03Detailed 6-step cutover procedures with built-in rollback strategies
040 GitHub stars
05Partitioned data migration with automated row-count and schema validation
06Conversion of legacy ETL jobs (Oozie, Spark-submit) to Databricks Workflows
ユースケース
01Migrating on-premises Hadoop clusters to Databricks for cloud modernization
02Transitioning from Snowflake or Redshift to Databricks to consolidate data stacks
03Replatforming legacy Oracle or Teradata warehouses to a modern Lakehouse architecture