How does this skill detect changes in my data structure?

The skill monitors specific trigger files such as R/schemas.R, R/data_loading.R, and Python pipeline scripts for modifications to column names, return values, and factor levels.

Does it automatically overwrite my documentation?

No, the skill identifies the necessary changes and proposes specific text updates for you to review and confirm before they are applied to your data dictionary.

Can this skill handle column renaming?

Yes, it detects when columns are renamed in the code, updates the documentation accordingly, and prompts you to verify that no other code still uses the old name.

What happens if I add a entirely new dataset?

When a new loading function or pipeline output is detected, the skill will propose a new documentation section for that dataset within your data-dictionary.md.

Data Dictionary Sync

Name: Data Dictionary Sync
Author: nate-d-olson

bynate-d-olson

0•

Ciencia de Datos y ML

Maintains documentation consistency by automatically updating data dictionaries whenever R schemas or pipeline outputs change.

The Data Dictionary Sync skill ensures that your project's documentation remains a source of truth by monitoring changes in R loading functions, Arrow schemas, and Snakemake pipeline scripts. It proactively detects renamed columns, new factor levels, and modified output formats in the codebase, proposing precise updates to your data-dictionary.md file. This prevents documentation rot and ensures that data scientists and developers are always working with accurate descriptions of the underlying data structures.

Características Principales

01Synchronizes column names and data types with Markdown documentation

02Updates factor levels and validation rules in real-time

03Generates documentation sections for new datasets and cached objects

04Automatic schema drift detection across R and Python scripts

050 GitHub stars

06Supports Snakemake workflow output format monitoring

Casos de Uso

01Ensuring project documentation reflects current R schemas after refactoring loading functions

02Maintaining a consistent data dictionary in bioinformatics and genomic benchmarking projects

03Streamlining the onboarding process by providing up-to-date data definitions for team members

Características Principales

01Synchronizes column names and data types with Markdown documentation

02Updates factor levels and validation rules in real-time

03Generates documentation sections for new datasets and cached objects

04Automatic schema drift detection across R and Python scripts

050 GitHub stars

06Supports Snakemake workflow output format monitoring

Casos de Uso

01Ensuring project documentation reflects current R schemas after refactoring loading functions

02Maintaining a consistent data dictionary in bioinformatics and genomic benchmarking projects

03Streamlining the onboarding process by providing up-to-date data definitions for team members