소개
The Eval Harness skill integrates Eval-Driven Development (EDD) principles directly into your Claude Code workflow, treating evaluations as the essential unit tests for AI-assisted development. By defining success criteria before implementation, users can track feature capabilities and regression risks through deterministic code-based graders, model-based qualitative reviews, and pass@k metrics. This structured approach ensures that AI-generated code is not only functional but meets specific, repeatable quality standards across multiple sessions.