关于
The Evaluation Skill provides a robust framework for building and managing comprehensive evaluation suites for agentic tools following a Spec-Test-Driven Development (STDD) process. It enables developers to define clear success criteria through standardized spec.md and rubric.md files, facilitating a hybrid validation approach that combines deterministic code-based checks with qualitative LLM-as-judge assessments. This skill is essential for ensuring AI agents meet specific domain requirements, maintain high-quality reasoning traces, and remain reliable throughout the development lifecycle.