01Contract-driven evaluation for consistent agent performance measurement
02Strict rollback policies and guardrail checks to prevent performance regressions
03Deep diagnostic integration for trace-level retrieval and metric normalization
04Automated failure analysis and trend diagnosis against performance baselines
050 GitHub stars
06Configurable lever cardinality supporting both single and multi-variable experiments