01Metric selection framework for objective and subjective tasks
02LLM-as-Judge implementation patterns
03Pairwise comparison protocols with consistency checks
04Systematic bias mitigation (Position, Length, Verbosity)
05Structured rubric generation and scale calibration
060 GitHub stars