01Critical dimension monitoring with automated rollback triggers
02Multi-dimensional metric definition with custom weights and thresholds
03Support for both curated dataset and live-trace evaluation modes
040 GitHub stars
05Baseline performance tracking for iteration comparison
06Automated bootstrapping of Langfuse datasets and judge prompts