Automates code review for R Tidymodels workflows to prevent data leakage and ensure statistical best practices.
This skill acts as a specialized auditor for R data science projects, focusing on the Tidymodels ecosystem and 'Tidy Modeling with R' (TMwR) principles. It systematically scans R scripts for critical anti-patterns such as data leakage, improper resampling, and workflow mismanagement. By identifying issues like preprocessing before splitting or missing stratification in imbalanced datasets, it helps data scientists build more robust, reproducible, and statistically valid machine learning models while reducing the risk of overly optimistic performance estimates.
주요 기능
01Identifies resampling violations including missing stratification for imbalanced data
020 GitHub stars
03Enforces 'workflow' object usage to automate safe preprocessing and fitting
04Detects critical data leakage patterns like prepping recipes on test data
05Validates evaluation logic to prevent testing on training data
06Checks for reproducibility by flagging missing random seeds in stochastic operations
사용 사례
01Peer-reviewing data science scripts to ensure statistical validity and no data leakage
02Auditing complex machine learning pipelines for imbalanced classification tasks
03Onboarding developers to the Tidymodels ecosystem using TMwR best practices