01Generates detailed evaluation reports and maintains project-level history
02Multi-modal grading including Code-based, Model-based, and Human review
03Calculates reliability metrics such as pass@k and pass^k
04Automates capability and regression evaluation testing
05Supports Evaluation-Driven Development (EDD) pre-coding workflows
060 GitHub stars