소개
This skill empowers Claude to perform deep diagnostic evaluations of machine learning models directly within your development environment. By leveraging the model-evaluation-suite plugin, it automates the calculation of critical performance metrics like F1-score, precision, and recall, enabling developers to compare model versions, validate performance on held-out datasets, and identify specific areas for optimization before deployment.