Which SDKs does this skill support?

It supports four major implementations: Mini, Claude, Copilot, and Microsoft SDK adapters, allowing for side-by-side benchmarking.

What is the minimum improvement required to commit changes?

The builder requires a net improvement of at least 2% across the evaluation suite and will automatically revert changes if any single level regresses by more than 5%.

How does the Research phase prevent blind changes?

It requires the agent to state a hypothesis, gather evidence from evaluation results, and consider counter-arguments before deciding whether to apply, skip, or defer a fix.

Can I run the improvement cycle without modifying my code?

Yes, you can use the --dry-run flag to perform evaluation and analysis without applying any permanent changes to the agent's source code.

Self-Improving Agent Builder

Name: Self-Improving Agent Builder
Author: rysweet

byrysweet

•

데이터 과학 및 ML

Automates the iterative optimization of AI agents through a continuous evaluation, analysis, and research-driven improvement loop.

The Self-Improving Agent Builder is a sophisticated framework designed to enhance the performance of goal-seeking agents by implementing a rigorous closed-loop improvement cycle. It automates the entire lifecycle of agent development by executing six distinct phases: evaluating performance against L1-L12 test suites, analyzing failures with a specific error taxonomy, conducting hypothesis-driven research, applying targeted code or prompt fixes, and verifying results. With built-in regression protection, the skill ensures that improvements are only committed when they meet strict performance thresholds, making it an essential utility for developers looking to build robust, high-performing agentic systems without manual trial-and-error.

주요 기능

01Regression gating that auto-commits net improvements and reverts performance drops

0238 GitHub stars

03Continuous EVAL-ANALYZE-RESEARCH-IMPROVE-DECIDE loop for automated agent evolution

04Multi-SDK benchmarking support for Mini, Claude, Copilot, and Microsoft implementations

05L1-L12 progressive test suite for measuring complex reasoning and task execution

06Automated failure taxonomy mapping to identify specific code or prompt weaknesses

사용 사례

01Benchmarking different LLM SDK implementations to identify the most efficient framework for specific tasks

02Iteratively improving the accuracy of custom AI coding agents through automated prompt engineering

03Fixing persistent agent regressions by analyzing failure patterns and applying research-backed adjustments

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add rysweet/amplihack self-improving-agent-builder

For use in Claude.ai and ChatGPT

Download Skill