01Multi-dimensional quality scorer design (accuracy, precision, anti-hallucination)
02Golden dataset (golden-dataset.yaml) design and expansion guidance
03Comprehensive failure mode analysis for hallucinations and overfitting
040 GitHub stars
05Generalization testing patterns to prevent prompt-example bias
06Orthogonal test matrix construction using pairwise logic