关于
This skill automates the comparative testing of agentic AI skills by executing them across the Claude model family, including Sonnet, Opus, and Haiku. It utilizes a sophisticated weighting system that moves beyond binary pass/fail metrics to identify specific model pitfalls and calculate quality-based scores. By running scenarios in parallel via sub-agents, it helps developers determine production readiness, identify the most cost-effective compatible model for specific tasks, and document historical performance directly within the repository's README.