01OpenTelemetry-based trace collection for LangChain, LlamaIndex, and OpenAI
02LLM-as-judge evaluators for hallucination, relevance, and toxicity detection
03Interactive playground for side-by-side prompt and model comparisons
04Production-ready monitoring with self-hosted PostgreSQL or SQLite support
05Versioned dataset management for experiments and regression testing
063,983 GitHub stars