Acerca de
Eval Harness is a specialized framework for Claude Code that adopts Eval-Driven Development (EDD), treating evaluations as the 'unit tests' of AI-assisted software engineering. It enables developers to define rigorous success criteria before implementation, perform deterministic code-based grading, and leverage model-based or human-led reviews to ensure high reliability. By tracking metrics like pass@k and monitoring for regressions across sessions, this skill helps developers build more predictable and robust AI-generated features while maintaining a historical record of system performance within the project directory.