Implements iterative reflection and evaluation loops to optimize AI agent outputs through self-critique and structured scoring.
The agentic-eval skill provides a sophisticated framework for moving beyond single-shot AI generation by implementing recursive refinement cycles. It enables Claude to utilize self-critique loops, evaluator-optimizer pipelines, and rubric-based scoring to ensure high-quality, production-ready results. Whether you are generating complex code that requires test-driven validation or technical reports that must adhere to specific style guides, this skill provides the architectural patterns needed to significantly reduce hallucinations and improve the precision of agentic workflows.
主要功能
01LLM-as-judge comparison and ranking patterns
02Evaluator-optimizer pipeline architectures
03Test-driven code refinement workflows
045 GitHub stars
05Rubric-based scoring and weighted dimensions
06Self-critique and reflection loops
使用场景
01Compliance and style guide enforcement for AI-generated content
02Iterative refinement of technical documentation and reports
03Quality-critical code generation and automated debugging