About
The Eval Harness skill provides a formal framework for implementing Eval-Driven Development (EDD) within Claude Code. By treating evaluations as the unit tests of AI development, it allows developers to define success criteria before implementation, track progress using pass@k metrics, and ensure long-term stability via automated regression checks. This skill is essential for transitioning from trial-and-error AI interactions to a measurable, deterministic development process that ensures Claude's outputs meet specific project requirements.