How does this skill help reduce AI hallucinations?

By creating specific LLM graders and test cases for known failure modes, you can systematically identify, measure, and mitigate hallucinations through targeted optimization cycles.

What are the phases of the optimization loop?

The loop consists of four stages: Hypothesize (plan the change), Experiment (run the test), Analyze (check the results), and Compound (capture and document the learnings).

What is the primary benefit of using the Agentic Optimization Craft skill?

It replaces 'shotgun debugging' with a systematic, data-driven process for improving AI agents, ensuring every change is validated by measurable evidence.

Do I need an existing dataset to use this skill?

No, the skill includes a specialized 'Bootstrap' phase designed to help you create datasets and automated graders from your existing production traces or freeform feedback.

Does it integrate with third-party observability tools?

Yes, it is specifically optimized to work with Langfuse for managing traces, annotations, and dataset versioning.

Agentic Optimization Craft

Name: Agentic Optimization Craft
Author: mberto10

bymberto10

0•

Ciencia de Datos y ML

Implements a systematic, hypothesis-driven methodology for the iterative improvement of AI agents through automated evaluations and data-driven feedback loops.

Agentic Optimization Craft is a comprehensive framework designed to move AI development beyond trial-and-error prompt engineering. It guides developers through an 'evaluation-first' workflow, enabling the creation of robust testing infrastructure directly from production traces or human feedback. By following a structured loop of hypothesizing, experimenting, and analyzing results, users can achieve compounding improvements in agent quality. This skill is particularly useful for teams looking to build persistent optimization journals, automate LLM grading, and scale their agentic systems with scientific rigor.

Características Principales

01Seamless integration with Langfuse for trace retrieval and annotation management

02Hypothesis-driven iteration loop for systematic agent performance gains

03Automated evaluation bootstrapping from production traces and human feedback

04Structured optimization journaling to track baseline metrics and historical experiments

05Guidance for creating specialized LLM-based graders and curated evaluation datasets

060 GitHub stars

Casos de Uso

01Scaling agent development through a repeatable, data-driven optimization process

02Establishing a baseline and automated evaluation suite for new AI applications

03Improving agent accuracy and task completion rates using real-world failure cases

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add mberto10/mberto-compound agentic-optimization-craft

For use in Claude.ai and ChatGPT

Download Skill