About
The Eval Harness skill enables developers to implement Eval-driven development (EDD) within Claude Code. It provides a structured framework for defining capability and regression tests using YAML, tracking performance via pass@k metrics, and managing a systematic evaluation directory. This skill is essential for projects requiring high reliability, allowing for objective measurement of AI-generated code quality, verification of bug fixes, and maintenance of stability during refactoring or dependency updates.