01Standardized Direct Scoring and Pairwise Comparison protocols
02Automated bias mitigation for position, length, and authority
03Metric selection framework (F1, Cohen's κ, Spearman's ρ)
040 GitHub stars
05Evidence-based Chain-of-Thought scoring patterns
06Structured JSON output formatting for evaluation pipelines