Generates optimized Apache Spark jobs and ETL pipelines for robust big data processing and transformation.
Spark Job Creator is a specialized skill for Claude Code designed to streamline the development of data pipelines using Apache Spark. It automates the creation of production-ready code for ETL processes, data transformations, and streaming workflows while ensuring adherence to industry best practices. Whether you are orchestrating complex data movements with Airflow or implementing real-time streaming, this skill provides step-by-step guidance, validates configurations, and generates scalable boilerplate to accelerate your data engineering lifecycle.
주요 기능
01Optimization of Spark Structured Streaming configurations
020 GitHub stars
03Validation of job outputs against common data engineering standards
04Automated Spark boilerplate generation for PySpark and Scala
05Implementation of industry-standard ETL and data transformation patterns
06Integration support for workflow orchestration tools like Apache Airflow
사용 사례
01Migrating legacy data scripts into optimized Spark applications
02Developing real-time streaming applications for high-velocity data
03Setting up new Spark ETL pipelines for large-scale data lake processing