Autonomously enhances Claude Code skills through iterative benchmarking, reflection-driven prompt mutation, and performance scoring.
Skill Auto-Optimizer implements an autonomous research loop to systematically increase the reliability of any Claude Code skill from baseline to production-grade performance. By adapting Andrej Karpathy's autoresearch methodology, the skill repeatedly executes target tasks, scores results against binary evaluation criteria, and uses reflection-driven mutation to diagnose and fix specific failure patterns. It provides a comprehensive optimization environment featuring a live HTML dashboard for real-time progress tracking, structured session archives to prevent redundant experiments, and sophisticated 'stuck detection' to overcome performance plateaus without manual intervention.
Características Principales
01Reflection-driven mutation that diagnoses root causes of failed outputs
02Live HTML dashboard for real-time visualization of improvement trends
03Structured session archives to ensure cross-session experiment continuity
04Automated binary evaluation system for objective performance scoring
05Regression detection to prevent improvements in one area from breaking others
0633 GitHub stars
Casos de Uso
01Automating the iterative prompt engineering cycle for complex workflows
02Refining inconsistent skills that fail on edge cases or specific formatting
03Establishing performance baselines for new AI agents and skills