概要
The Eval Framework skill provides a structured meta-framework for managing AI-driven evaluations such as architecture reviews, code audits, and security checks. It addresses the challenge of AI output variance by enforcing a strict YAML schema for findings, storing results in version-controlled files, and providing analytical tools to calculate overlap, precision, and recall between different runs. This allows developers to audit Claude's outputs, validate findings across different models (like Opus vs. Sonnet), and track the evolution of code quality over time with data-backed consistency scores and automated comparison reports.