01Unified interface for 18+ evaluation harnesses (simple-evals, bigcode, etc.)
02Multi-target support for NVIDIA NIM, vLLM, and OpenAI-compatible APIs
03Access to 100+ industry-standard benchmarks including MMLU, GSM8K, and HumanEval
04Containerized, reproducible execution across local, Slurm, and Lepton backends
050 GitHub stars
06Direct result exporting to MLflow, Weights & Biases, and local YAML formats