Will this skill commit my large data files to Git?

No, the skill includes specific guardrails to prevent committing large datasets to version control, focusing instead on tracking the metadata.

How does it verify data integrity?

It uses checksums stored within a manifest file (JSON/TOML/CSV) to ensure that the dataset source remains consistent across environments.

Does this skill require DVC or external storage?

No, it prioritizes lightweight tracking using manifests and checksums, though it respects existing DVC setups if they are already in use.

When is the best time to use this refactoring skill?

It should be used when you need to make input data changes explicit and ensure that results can be accurately traced back to a specific version of source data.

Data Versioning Refactor

Name: Data Versioning Refactor
Author: Silviase

bySilviase

0•

데이터 과학 및 ML

Implements lightweight dataset tracking and reproducibility patterns to ensure data changes are explicit and traceable.

This skill provides a structured workflow for implementing data versioning within projects, focusing on reproducibility without the overhead of complex external tools. It guides Claude to define dataset sources, generate checksums, maintain metadata manifests, and separate raw data from derived artifacts. By recording dataset versions alongside experiment outputs, it ensures that every result is linked to the specific data state that produced it, making it ideal for data science projects and ML pipelines where tracking data evolution is critical.

주요 기능

01Integration of dataset versions into experiment and output logging

02Clear separation of raw data from derived/processed artifacts

03Lightweight metadata tracking designed to avoid repository bloat

040 GitHub stars

05Automated creation of dataset manifests in CSV, JSON, or TOML formats

06Source and version tracking with checksum-based integrity verification

사용 사례

01Ensuring reproducibility in machine learning experiments by linking outputs to data states

02Documenting data lineage and transformations within automated data pipelines

03Transitioning from unmanaged local datasets to structured, documented data manifests

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add silviase/default-python-project refactoring-12-data-versioning

For use in Claude.ai and ChatGPT

Download Skill