How do I track score regressions with this skill?

You can use the regression command to compare a historical baseline period against a current period to identify any statistically significant drops in model performance.

What credentials are required to use this skill?

You need to provide your Langfuse Public Key, Secret Key, and Host URL as environment variables to authenticate the analyzer.

What is Langfuse Score Analytics?

It is a Claude Code skill that enables automated analysis of LLM evaluation scores and quality metrics stored in the Langfuse observability platform.

What dimensions can I compare scores by?

The skill allows you to compare scores across different releases, environments (e.g., prod vs. dev), and trace names.

Can I analyze custom score names like 'helpfulness' or 'accuracy'?

Yes, you can specify any custom score name defined in your Langfuse project when running analysis commands.

Langfuse Score Analytics

Name: Langfuse Score Analytics
Author: mberto10

bymberto10

0•

Analytics & Monitoring

Analyzes and visualizes LLM quality scores, trends, and regressions within the Langfuse observability platform.

Langfuse Score Analytics is a specialized Claude Code skill designed for developers and LLM engineers who need deep insights into their model performance. It provides automated tools to track score metrics over time, identify performance regressions between releases, and compare quality across different environments or trace names. By integrating directly with Langfuse, it allows users to quickly visualize distributions and trends, making it easier to monitor AI application reliability and optimize model outputs based on quantitative data directly from the CLI.

Key Features

010 GitHub stars

02Analyze score trends with configurable time granularity

03Compare quality metrics across releases, environments, or trace names

04Detect performance regressions by comparing baseline and current windows

05Visualize score distributions using histogram-style binning

06Summarize and list all available evaluation scores in a Langfuse project

Use Cases

01Monitoring LLM performance shifts after deploying a new prompt or model version

02Identifying quality discrepancies between staging and production environments

03Quantifying distribution of user feedback scores to prioritize model improvements

Key Features

010 GitHub stars

02Analyze score trends with configurable time granularity

03Compare quality metrics across releases, environments, or trace names

04Detect performance regressions by comparing baseline and current windows

05Visualize score distributions using histogram-style binning

06Summarize and list all available evaluation scores in a Langfuse project

Use Cases

01Monitoring LLM performance shifts after deploying a new prompt or model version

02Identifying quality discrepancies between staging and production environments

03Quantifying distribution of user feedback scores to prioritize model improvements