01Eval-Driven Development (EDD) workflow integration
02Multi-modal grading via code, AI models, and human review
03Automated capability and regression testing templates
04Reliability tracking with pass@1 and pass@k metrics
050 GitHub stars
06Standardized evaluation reporting and session logging