Does this skill require specific environment variables?

Yes, you must configure LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY. LANGFUSE_HOST is optional and defaults to the cloud version.

How do I find traces that still need to be reviewed?

You can use the 'pending' operation to find traces missing a specific score name within a defined number of days.

Can I export my annotations for external analysis?

Yes, the skill allows you to export scores to both JSON and CSV formats, which is ideal for use in data science notebooks or spreadsheets.

What types of scores does this skill support?

The skill supports three primary data types: Numeric (float values like 1-10), Categorical (string labels like 'helpful'), and Boolean (true/false or pass/fail).

Langfuse Annotation Manager

Name: Langfuse Annotation Manager
Author: mberto10

bymberto10

0•

Analytics & Monitoring

Manages human annotations and quality scores for Langfuse traces directly within the Claude Code environment.

The Langfuse Annotation Manager skill empowers developers to streamline their LLM evaluation workflows by managing human-in-the-loop annotations. It provides a comprehensive suite of tools to create, update, and delete trace scores across numeric, categorical, and boolean data types. By enabling users to quickly identify pending traces and export annotation data to JSON or CSV, it bridges the gap between observability and actionable quality improvements for AI applications.

Key Features

01Identify pending traces requiring human review based on timeframes

020 GitHub stars

03Create numeric, categorical, and boolean scores for traces

04Update or delete existing trace scores and comments

05Export annotation data to JSON and CSV formats for analysis

06View and list available score configurations and types

Use Cases

01Managing large-scale annotation workflows to find and review unrated traces

02Generating labeled datasets for fine-tuning by exporting verified scores

03Conducting manual QA and human-in-the-loop evaluation of LLM outputs

Key Features

01Identify pending traces requiring human review based on timeframes

020 GitHub stars

03Create numeric, categorical, and boolean scores for traces

04Update or delete existing trace scores and comments

05Export annotation data to JSON and CSV formats for analysis

06View and list available score configurations and types

Use Cases

01Managing large-scale annotation workflows to find and review unrated traces

02Generating labeled datasets for fine-tuning by exporting verified scores

03Conducting manual QA and human-in-the-loop evaluation of LLM outputs