01Detailed A/B comparison between baseline and candidate runs to detect regressions
02Advanced jq-powered querying for custom data filtering and tool frequency analysis
03Execution path visualization showing LLM calls and tool invocations
04Percentile-based statistical analysis for score, latency, and cost metrics
05Chronological listing and inspection of AgentV evaluation result files
0611 GitHub stars