Can I use this to find regressions in my AI agent?

Yes, it uses the 'agentv compare' command to provide a side-by-side analysis of baseline and candidate runs, highlighting specific tests where the score has decreased.

What file formats does this skill analyze?

It primarily analyzes JSONL result files produced by AgentV evaluations, supporting both table and JSON output formats for further processing.

Does it support data visualization of agent steps?

It provides a tree-view visualization of execution paths, showing the sequence of LLM reasoning calls interspersed with tool invocations for easier debugging.

What is AgentV Trace Analyst?

It is a specialized capability for Claude Code that allows for the detailed analysis of AI agent evaluation traces, result files, and performance metrics generated by the AgentV framework.

How does it help with LLM cost optimization?

The skill identifies tests with high token usage and computes cost statistics, allowing developers to find expensive outliers and evaluate if cheaper models can maintain performance.

AgentV Trace Analyst

Name: AgentV Trace Analyst
Author: EntityProcess

byEntityProcess

•

Analíticas y Monitorización

Analyzes AI agent evaluation traces and performance metrics to optimize LLM application behavior and cost.

The AgentV Trace Analyst skill provides deep visibility into AI agent performance by processing AgentV evaluation traces and result files. It enables developers to compute detailed percentile statistics for scores, latency, and costs, visualize execution paths through tree views, and perform head-to-head A/B comparisons between different runs. By identifying failure patterns, tool usage bottlenecks, and regression deltas, it helps refine agent logic and ensure high-quality, cost-effective tool trajectories.

Características Principales

01Detailed A/B comparison between baseline and candidate runs to detect regressions

02Advanced jq-powered querying for custom data filtering and tool frequency analysis

03Execution path visualization showing LLM calls and tool invocations

04Percentile-based statistical analysis for score, latency, and cost metrics

05Chronological listing and inspection of AgentV evaluation result files

0611 GitHub stars

Casos de Uso

01Debugging agent failures by inspecting failed assertions and execution trajectories

02Validating prompt and configuration changes through regression testing and delta analysis

03Optimizing LLM costs and performance by identifying high-latency and high-token outliers

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add entityprocess/agentv agentv-trace-analyst

For use in Claude.ai and ChatGPT

Download Skill