01Automated prompt optimization for classification and generation tasks
02LLM-as-a-Judge patterns for automated qualitative assessment
03Row-level text descriptors for sentiment, length, and validity
04Comprehensive HTML and JSON reporting for performance monitoring
05RAG quality evaluation focusing on relevance and factuality
060 GitHub stars