关于
The cleaning-data skill is a cornerstone of the DataPeeker framework, designed to transform raw, messy data into high-quality datasets ready for rigorous analysis. It automates complex quality checks including exact and near-duplicate detection via fuzzy matching, MAD-based outlier identification, and referential integrity validation. By guiding users through a structured five-phase process—from quality report review to final verification—it ensures every cleaning decision is documented with a clear rationale, minimizing bias and maximizing the validity of subsequent analytical insights.