010 GitHub stars
02Support for Anthropic Claude 3.5 and 4.5 model families
03Automated LLM output testing and side-by-side comparison
04LLM-as-judge capabilities via llm-rubric for qualitative grading
05Extensive assertion library including Regex, JSON, and Semantic Similarity
06Integrated web UI for visualizing evaluation results and metrics