소개
This skill provides a standardized framework for building robust data pipelines within Claude Code, focusing on data reproducibility and traceability for machine learning workflows. It enables users to define multiple data sources—including SQL databases, REST APIs, web scrapers, and synthetic data generators—within a unified YAML configuration. By automating data fetching, merging, validation, and splitting, the skill ensures that training datasets can be reconstructed identically during model iteration, while maintaining security through environment variable management.