01LLM-as-judge implementation for multi-dimensional output scoring
02Pairwise comparison for A/B testing different model outputs
03Configurable quality gates to block low-confidence AI responses
0469 GitHub stars
05Automated hallucination detection and factual grounding checks
06RAGAS metrics support for faithfulness, relevancy, and context precision