How does the Evaluator-Optimizer pattern work?

It separates the task into two distinct roles: one component generates the content while another evaluates it against specific criteria, providing structured feedback for the next optimization pass.

What is the primary benefit of the agentic-eval skill?

It allows AI agents to iteratively improve their own work through feedback loops, leading to significantly higher accuracy and fewer errors compared to single-shot generation.

Can I use this skill for non-coding tasks?

Absolutely. While highly effective for code, these patterns are ideal for any high-stakes output such as complex analysis, strategic reports, or content requiring strict adherence to a rubric.

What is meant by LLM-as-judge?

LLM-as-judge is a technique where a language model is used to compare multiple outputs or score a single output based on a predefined set of qualitative criteria, acting as an automated quality assurance layer.

Agentic Evaluation Patterns

Name: Agentic Evaluation Patterns
Author: Tahir-yamin

byTahir-yamin

•

数据科学与机器学习

Implements iterative reflection and evaluation loops to optimize AI agent outputs through self-critique and structured scoring.

The agentic-eval skill provides a sophisticated framework for moving beyond single-shot AI generation by implementing recursive refinement cycles. It enables Claude to utilize self-critique loops, evaluator-optimizer pipelines, and rubric-based scoring to ensure high-quality, production-ready results. Whether you are generating complex code that requires test-driven validation or technical reports that must adhere to specific style guides, this skill provides the architectural patterns needed to significantly reduce hallucinations and improve the precision of agentic workflows.

主要功能

01LLM-as-judge comparison and ranking patterns

02Evaluator-optimizer pipeline architectures

03Test-driven code refinement workflows

045 GitHub stars

05Rubric-based scoring and weighted dimensions

06Self-critique and reflection loops

使用场景

01Compliance and style guide enforcement for AI-generated content

02Iterative refinement of technical documentation and reports

03Quality-critical code generation and automated debugging

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add tahir-yamin/dev-engineering-playbook agentic-eval

For use in Claude.ai and ChatGPT

Download Skill