010 GitHub stars
02Pre-production batch testing for agent validation
03Custom LLM-as-judge patterns for domain-specific quality metrics
04Continuous production monitoring with CloudWatch integration
0513 Built-in evaluators including Correctness, Safety, and Helpfulness
06Detailed scoring for tool selection and parameter accuracy