Can I run evaluations with custom data files?

Yes, the run_eval.py script allows you to run evaluations by passing a JSON data file via the --data-file argument.

Why are my logs not appearing in the Braintrust dashboard?

When using the SDK for logging, ensure you call logger.flush() at the end of your script to ensure all data is sent to the Braintrust servers.

What are the common SQL quirks in Braintrust?

Braintrust SQL uses dot notation for nested fields (e.g., metadata.user_id) and specific time functions like hour() or day() instead of date_trunc().

How do I authenticate the Braintrust skill?

You need to create a .env file in your project directory containing your BRAINTRUST_API_KEY.

How do I query logs using the Braintrust skill?

You can use the query_logs.py script to run SQL queries against your Braintrust logs. It supports standard SQL operators along with specialized syntax for metadata and time ranges.

Braintrust LLM Observability

Name: Braintrust LLM Observability
Author: braintrustdata

bybraintrustdata

•

Analytics & Monitoring

Integrates Claude Code with Braintrust to evaluate, log, and monitor LLM application performance using SQL and automated scripts.

The Braintrust skill empowers developers to implement robust LLM observability directly within their Claude Code environment. It provides a suite of scripts for querying logs with specialized SQL, running comprehensive evaluations, and logging input/output data for better model transparency. By bridging the gap between development and monitoring, this skill allows users to analyze performance trends, filter logs by metadata, and iterate on LLM prompts with data-driven insights, ensuring production-grade reliability for AI applications.

Key Features

01Streamline SDK integration with standardized implementation patterns

02Log model inputs, outputs, and metadata for real-time observability

033 GitHub stars

04Query Braintrust logs using specialized SQL syntax for rapid analysis

05List and manage Braintrust projects directly from the CLI

06Run automated LLM evaluations with built-in or custom scorers

Use Cases

01Debugging LLM outputs by querying historical logs with SQL filters

02Measuring model accuracy using built-in evaluation scripts and scorers like Factuality

03Monitoring production LLM usage patterns and performance metrics through project logs

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add braintrustdata/braintrust-claude-plugin using-braintrust

For use in Claude.ai and ChatGPT

Key Features

01Streamline SDK integration with standardized implementation patterns

02Log model inputs, outputs, and metadata for real-time observability

033 GitHub stars

04Query Braintrust logs using specialized SQL syntax for rapid analysis

05List and manage Braintrust projects directly from the CLI

06Run automated LLM evaluations with built-in or custom scorers

Use Cases

01Debugging LLM outputs by querying historical logs with SQL filters

02Measuring model accuracy using built-in evaluation scripts and scorers like Factuality

03Monitoring production LLM usage patterns and performance metrics through project logs

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add braintrustdata/braintrust-claude-plugin using-braintrust

For use in Claude.ai and ChatGPT