概要
This skill empowers Claude to conduct thorough performance assessments of machine learning models within the development environment. By leveraging the model-evaluation-suite plugin, it automates the calculation of critical validation metrics, benchmarks different model versions, and provides actionable insights for improving model reliability before deployment. It is particularly useful for data scientists and ML engineers who need to validate TimeGPT pipelines or custom AI models directly through natural language commands in Claude Code.