01Automated templates for capability and regression evaluations
02Support for deterministic code-based, model-based, and human graders
030 GitHub stars
04Standardized Eval-Driven Development (EDD) workflow
05Pass@k and Pass^k metrics for reliability measurement
06Project-level eval storage and standardized reporting