01Automated NLP metrics including BLEU, ROUGE, and BERTScore
02LLM-as-Judge patterns for pairwise and pointwise scoring
03Statistical A/B testing with Cohen's d effect size analysis
04Retrieval Augmented Generation (RAG) metrics like MRR and NDCG
05Automated regression detection for CI/CD pipeline integration
061 GitHub stars