01Integrated version control for evaluations via the .claude/evals directory
02Standardized templates for Capability and Regression evaluations
03Reliability tracking using pass@k and pass^k success metrics
040 GitHub stars
05Multi-modal grading including deterministic code-based and AI model-based graders
06Automated reporting and status generation for feature development