01Comprehensive support for both LLMs and Vision-Language Models (VLMs)
02Containerized architecture ensures fully reproducible benchmarking results
03Multi-backend execution support for Local Docker, Slurm HPC, and Cloud platforms
04Built-in result exporting to enterprise tools like MLflow, Weights & Biases, and JSON
05Access 100+ benchmarks from 18+ different evaluation harnesses in one platform
06908 GitHub stars