01Supports both human review and automated scoring configurations
02Facilitates golden set creation for regression and A/B testing
03Streamlines LLM-as-judge prompt setup and management
04Automated dataset creation with purpose-driven metadata
05Configures evaluation dimensions like accuracy, tone, and completeness
061 GitHub stars