How many judges are involved in the process?

The skill orchestrates three independent judge agents in parallel to ensure a diversity of perspectives and minimize systematic bias.

What is the Judge with Debate pattern?

It is an evaluation strategy where multiple AI agents independently review a solution and then debate their disagreements to reach a more accurate, evidence-backed consensus.

Can I customize the evaluation criteria?

Yes, the skill allows you to define specific criteria, descriptions, and weights to tailor the evaluation process to your specific project or domain requirements.

Where are the evaluation reports stored?

All individual and consensus reports are saved locally in the '.specs/reports/' directory using a standardized naming convention for easy versioning and auditing.

What happens if the judges cannot reach a consensus?

If no consensus is reached after three debate rounds, the skill flags the results for human review and provides all individual judge reports for manual analysis.

Judge with Debate

Name: Judge with Debate
Author: NeoLabHQ

byNeoLabHQ

•

542

•

개발자 도구

Orchestrates a multi-agent debate between independent AI judges to reach a high-accuracy consensus on solution quality.

The Judge with Debate skill implements the Multi-Agent Debate pattern to provide rigorous, high-quality evaluations of code or architectural solutions. By launching three independent judge agents that analyze, challenge, and refine their assessments over multiple rounds, it eliminates groupthink and forces a deep dive into evidence-based evaluation. This process ensures that the final assessment is not just a single-pass opinion but a battle-tested consensus backed by specific code quotes and weighted criteria, significantly improving the reliability of automated code reviews.

주요 기능

01Evidence-based reporting requiring specific quotes and examples for every score.

02Iterative debate rounds where judges defend positions and challenge counter-arguments.

03Automated consensus detection based on weighted scores and specific criteria gaps.

04Multi-agent orchestration with three independent parallel judges to prevent groupthink.

05Parallel execution support optimized for high-rigor models like Claude 3 Opus.

06542 GitHub stars

사용 사례

01Evaluating architectural designs against competing technical constraints and requirements.

02Standardizing quality assessment criteria across a distributed development team's output.

03Performing rigorous code reviews for critical infrastructure or complex business logic.

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add neolabhq/context-engineering-kit judge-with-debate

For use in Claude.ai and ChatGPT

Download Skill