Does it support prompt versioning?

Yes, the skill allows you to fetch specific prompts by name and label (such as 'production' or 'latest') from the Langfuse prompt registry.

Can I use this skill to debug specific errors?

Yes, you can search for recent traces, filter by name or user ID, and retrieve full nested trace details to identify exactly where a generation failed or encountered latency issues.

How does this skill help with cost management?

It provides scripts to fetch metrics and calculate spend based on input/output tokens for models like Claude and GPT, allowing you to set budget alerts and track daily usage.

What environment variables are required for this skill?

To use this skill, you must configure LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_HOST (e.g., https://us.cloud.langfuse.com) in your environment.

Can I integrate Langfuse traces with my testing suite?

The skill includes a workflow for exporting failed or interesting traces from Langfuse as test cases for Promptfoo, facilitating robust regression testing.

Langfuse Observability

Name: Langfuse Observability
Author: phrazzld

byphrazzld

•

分析と監視

Queries Langfuse traces, prompts, and metrics to provide deep observability into LLM performance and costs.

The Langfuse Observability skill integrates Claude Code with the Langfuse platform to streamline LLM monitoring and debugging. It enables developers to retrieve detailed execution traces, analyze latency and token usage, and manage prompt versions directly within their workflow. By providing tools for cost calculation and failed generation debugging, this skill helps developers optimize their AI applications, ensure prompt quality, and maintain high-performance LLM infrastructure.

主な機能

012 GitHub stars

02Query detailed LLM generation traces including inputs, outputs, and latency

03Manage and fetch prompt versions using production or staging labels

04Monitor token usage and calculate costs across multiple model providers

05Analyze metrics summaries to identify error patterns and success rates

06Convert failed production traces into Promptfoo test cases for evaluation

ユースケース

01Monitoring production token spend and model performance for cost optimization

02Debugging failed LLM generations by inspecting full trace history and metadata

03Retrieving version-controlled prompts from Langfuse to ensure consistency across environments

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add phrazzld/claude-config langfuse-observability

For use in Claude.ai and ChatGPT

主な機能

012 GitHub stars

02Query detailed LLM generation traces including inputs, outputs, and latency

03Manage and fetch prompt versions using production or staging labels

04Monitor token usage and calculate costs across multiple model providers

05Analyze metrics summaries to identify error patterns and success rates

06Convert failed production traces into Promptfoo test cases for evaluation

ユースケース

01Monitoring production token spend and model performance for cost optimization

02Debugging failed LLM generations by inspecting full trace history and metadata

03Retrieving version-controlled prompts from Langfuse to ensure consistency across environments

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add phrazzld/claude-config langfuse-observability

For use in Claude.ai and ChatGPT