01Leverage LLMs to benchmark application performance using response and retrieval evals.
02Track and evaluate changes to prompts, LLMs, and retrieval.
035,347 GitHub stars
04Tracing of LLM application runtime using OpenTelemetry-based instrumentation.
05Create versioned datasets of examples for experimentation, evaluation, and fine-tuning.
06Optimize prompts, compare models, adjust parameters, and replay traced LLM calls.