01Multi-model evaluation across Bedrock, OpenAI, and Gemini models simultaneously
02Statistical rigor with Fleiss' kappa, bootstrap confidence intervals, and Kendall's W
03Self-consistency mode that works without API keys using host model via MCP Sampling
04AI-powered schema design and suggestion from your data
05Trend tracking to compare runs over time and detect agreement degradation
060 GitHub stars