01Support for binary, categorical, and continuous evaluation dimensions
02Automated deployment of live evaluation endpoints with secure API key management
03Interactive scoping protocol with structured pass/fail criteria design
04Automatic generation of companion skills and runnable cURL commands for workflow integration
053 GitHub stars
06Seed labeling workflow to calibrate LLM judge accuracy using real or synthetic traces