Analyzes and visualizes LLM quality scores, trends, and regressions within the Langfuse observability platform.
Langfuse Score Analytics is a specialized Claude Code skill designed for developers and LLM engineers who need deep insights into their model performance. It provides automated tools to track score metrics over time, identify performance regressions between releases, and compare quality across different environments or trace names. By integrating directly with Langfuse, it allows users to quickly visualize distributions and trends, making it easier to monitor AI application reliability and optimize model outputs based on quantitative data directly from the CLI.
Key Features
010 GitHub stars
02Analyze score trends with configurable time granularity
03Compare quality metrics across releases, environments, or trace names
04Detect performance regressions by comparing baseline and current windows
05Visualize score distributions using histogram-style binning
06Summarize and list all available evaluation scores in a Langfuse project
Use Cases
01Monitoring LLM performance shifts after deploying a new prompt or model version
02Identifying quality discrepancies between staging and production environments
03Quantifying distribution of user feedback scores to prioritize model improvements