012 GitHub stars
02Query detailed LLM generation traces including inputs, outputs, and latency
03Manage and fetch prompt versions using production or staging labels
04Monitor token usage and calculate costs across multiple model providers
05Analyze metrics summaries to identify error patterns and success rates
06Convert failed production traces into Promptfoo test cases for evaluation