01Statistical distribution analysis for stochastic agent behaviors
02Production reliability metrics including p95 latency and consistency scores
0331,721 GitHub stars
04Integration with industry benchmarks like AgentBench and Tau-bench
05Adversarial testing patterns to identify prompt injection risks
06Behavioral contract testing with must/must-not assertions