01Standardized evaluation specs for optimization loop integration
02Rubric-driven grading workflows for LLM-as-judge and hybrid evaluators
03Dataset strategy design covering production traces and synthetic cases
041 GitHub stars
05Structured metrics matrix for primary, constraint, and secondary goals
06Deep integration with Langfuse for dataset and prompt management