01Evidence-based reporting requiring specific quotes and examples for every score.
02Iterative debate rounds where judges defend positions and challenge counter-arguments.
03Automated consensus detection based on weighted scores and specific criteria gaps.
04Multi-agent orchestration with three independent parallel judges to prevent groupthink.
05Parallel execution support optimized for high-rigor models like Claude 3 Opus.
06542 GitHub stars