Manages Langfuse datasets for AI regression testing and golden set curation directly through the Claude Code CLI.
The Langfuse Dataset Management skill enables developers to bridge the gap between LLM observability and evaluation by streamlining the curation of production traces into structured datasets. It provides a powerful command-line interface for creating datasets, batch-adding traces for regression testing, and maintaining 'golden sets' of high-quality model outputs. By automating the extraction of trace inputs and metadata into schemas like eval_infra_v1, this skill simplifies the creation of validation pipelines and helps maintain rigorous performance standards for AI applications.
주요 기능
01Create and configure Langfuse datasets with custom metadata and descriptions
020 GitHub stars
03Automated extraction of trace inputs, outputs, and metadata for evaluation
04Batch-add production traces to datasets using trace ID text files
05Support for eval_infra_v1 contracts to facilitate optimization loops
06Idempotent metadata patching for dataset versioning and contract updates
사용 사례
01Curating failing production traces into regression datasets for debugging
02Managing evaluation infrastructure metadata for automated LLM testing
03Building verified 'golden sets' of high-quality outputs for model baselining