Can I use this to find failing traces?

While this skill manages the datasets, it is designed to work alongside the data-retrieval skill to first find failing traces and then move them into a dataset for long-term regression testing.

Does this skill support the Langfuse evaluation infrastructure?

Yes, it is specifically designed to work with the eval_infra_v1 metadata schema, including support for score scales, thresholds, and judge prompts.

What is the Langfuse Dataset Management skill?

It is a specialized capability for Claude Code that allows you to create and manage Langfuse datasets, curate traces for testing, and build regression sets directly from your development environment.

How do I add multiple traces to a dataset at once?

You can use the 'add-batch' command which accepts a text file containing trace IDs. This allows you to quickly move large numbers of traces into a dataset for validation.

Langfuse Dataset Manager

Name: Langfuse Dataset Manager
Author: mberto10

bymberto10

0•

Analytics & Monitoring

Manages Langfuse datasets for AI regression testing and golden set curation directly through the Claude Code CLI.

The Langfuse Dataset Management skill enables developers to bridge the gap between LLM observability and evaluation by streamlining the curation of production traces into structured datasets. It provides a powerful command-line interface for creating datasets, batch-adding traces for regression testing, and maintaining 'golden sets' of high-quality model outputs. By automating the extraction of trace inputs and metadata into schemas like eval_infra_v1, this skill simplifies the creation of validation pipelines and helps maintain rigorous performance standards for AI applications.

Key Features

01Create and configure Langfuse datasets with custom metadata and descriptions

020 GitHub stars

03Automated extraction of trace inputs, outputs, and metadata for evaluation

04Batch-add production traces to datasets using trace ID text files

05Support for eval_infra_v1 contracts to facilitate optimization loops

06Idempotent metadata patching for dataset versioning and contract updates

Use Cases

01Curating failing production traces into regression datasets for debugging

02Managing evaluation infrastructure metadata for automated LLM testing

03Building verified 'golden sets' of high-quality outputs for model baselining

Key Features

01Create and configure Langfuse datasets with custom metadata and descriptions

020 GitHub stars

03Automated extraction of trace inputs, outputs, and metadata for evaluation

04Batch-add production traces to datasets using trace ID text files

05Support for eval_infra_v1 contracts to facilitate optimization loops

06Idempotent metadata patching for dataset versioning and contract updates

Use Cases

01Curating failing production traces into regression datasets for debugging

02Managing evaluation infrastructure metadata for automated LLM testing

03Building verified 'golden sets' of high-quality outputs for model baselining