How does the skill prevent performance regressions?

The protocol enforces a strict policy with predefined thresholds and guard checks, automatically documenting learnings and rolling back changes if improvements do not meet criteria.

Can I use this with my own evaluation infrastructure?

Yes, the skill is designed to work with local evaluation contracts in YAML or JSON format and includes helper scripts to resolve these contracts and validate live objects.

What is an optimization lever in this context?

A lever is a specific change applied to an agent, such as a prompt adjustment, tool definition update, or parameter change, designed to influence performance metrics.

How does the 'single' versus 'multi' lever mode work?

Single mode tests one specific change at a time for high attribution confidence, while multi-mode allows testing up to 5 concurrent changes for faster iteration cycles.

Agentic Optimization Loop

Name: Agentic Optimization Loop
Author: mberto10

bymberto10

0•

Ciencia de Datos y ML

Automates the iterative refinement of AI agent performance through contract-driven evaluation and multi-lever optimization protocols.

The Optimization Loop skill provides a structured framework for improving AI agent behavior through a standardized, five-stage protocol: Diagnose, Hypothesize, Experiment, Analyze, and Compound/Decide. By integrating with evaluation contracts and metrics, it enables developers to systematically test changes—ranging from single prompts to multi-variable adjustments—while maintaining strict performance guardrails. The skill includes automated failure analysis and rollback criteria to ensure that only verified improvements are kept, making it essential for high-stakes agentic development where reliability is paramount.

Características Principales

01Contract-driven evaluation for consistent agent performance measurement

02Strict rollback policies and guardrail checks to prevent performance regressions

03Deep diagnostic integration for trace-level retrieval and metric normalization

04Automated failure analysis and trend diagnosis against performance baselines

050 GitHub stars

06Configurable lever cardinality supporting both single and multi-variable experiments

Casos de Uso

01Automating the prompt engineering loop to improve RAG retrieval accuracy

02Iteratively optimizing system instructions based on specific failure patterns identified in traces

03Systematically testing and refining agentic workflows across multiple model versions

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add mberto10/mberto-compound optimization-loop

For use in Claude.ai and ChatGPT

Download Skill