概要
The Eval Harness skill brings formal evaluation frameworks to Claude Code, enabling developers to treat AI behavior like unit tests through Eval-Driven Development (EDD). It allows users to define success criteria before coding, run deterministic or model-based graders to verify outputs, and track reliability metrics like pass@k to ensure consistency. This skill is essential for teams looking to move beyond 'vibes-based' AI development into a structured, regression-proof workflow that maintains high code quality across complex refactors and feature additions.