01Formal Eval-Driven Development (EDD) framework integration
02Continuous evaluation workflow with automated reporting
030 GitHub stars
04Standardized templates for capability and regression testing
05Support for code-based, model-based, and human grading
06Automated pass@k and pass^k reliability metrics tracking