Acerca de
The Eval Harness skill introduces Eval-Driven Development (EDD) to the Claude Code environment, treating AI evaluations as the modern equivalent of unit tests. It enables developers to define success criteria before implementation, run continuous capability and regression tests, and measure reliability using pass@k metrics. By bridging deterministic code checks with model-based grading, it ensures that AI-generated code remains stable, high-quality, and feature-complete across complex development cycles, providing a structured path from ideation to production-ready code.