01Support for deterministic code-based and probabilistic model-based grading
020 GitHub stars
03Formal Evaluation-Driven Development (EDD) workflow implementation
04Reliability tracking using pass@k and pass^k statistical metrics
05Automated capability and regression testing frameworks
06Standardized evaluation reporting and project-level eval storage