About
The Evaluation skill provides a comprehensive methodology for assessing non-deterministic agent systems, moving beyond traditional software testing to outcome-focused assessment. It enables developers to implement multi-dimensional rubrics covering factual accuracy, tool efficiency, and citation quality while leveraging LLM-as-judge patterns. By incorporating complexity stratification and token-budget analysis, this skill ensures that agentic workflows remain reliable, efficient, and high-performing as context and complexity scale.