01Automated generation of evaluation harnesses and test scenario templates
020 GitHub stars
03Multi-backend support including Claude and OpenCode execution
04Integrated troubleshooting workflow for refining skill prompts and instructions
05LLM-as-judge verification for high-level behavioral pass/fail assessment
06Standardized YAML schema for managing golden example test cases