Can this skill help me improve my AI evaluation tests?

Yes, it includes an Eval Performance Review flow that identifies gaps in your test coverage, such as missing edge cases or adversarial scenarios encountered in the wild.

How does the 'Agency Promotion' workflow work?

It provides a standardized checklist covering quality metrics, safety, trust, and operational readiness to determine if an AI feature is ready for a higher level of autonomy.

What is the primary focus of the Calibrate skill?

The Calibrate skill focuses on post-launch refinement of AI features, helping teams learn from production data to improve quality rather than trying to achieve perfection before shipping.

What kind of error patterns does this skill track?

It helps you document and categorize common AI failures including hallucinations, tone mismatches, scope creep, missing information, and confidence miscalibration.

Is this skill based on an established framework?

Yes, it is based on the Continuous Calibration/Continuous Development (CC/CD) framework adapted for modern AI-era product management.

AI Feature Calibration

Name: AI Feature Calibration
Author: breethomas

bybreethomas

•

생산성 및 워크플로

Refines post-launch AI feature performance through systematic error analysis, evaluation reviews, and autonomy level adjustments.

The Calibrate skill implements a structured workflow for the post-launch phase of AI product development, moving away from pre-launch over-optimization toward real-world refinement. Grounded in the Continuous Calibration/Continuous Development (CC/CD) framework, it enables developers and product managers to document production error patterns, perform gap analyses on existing evaluation suites, and determine when a feature is ready for higher levels of autonomy. This skill is essential for teams looking to bridge the gap between initial deployment and production-grade reliability by turning user corrections and failures into actionable calibration data.

주요 기능

01Systematic documentation and categorization of AI error patterns such as hallucinations or context misses

0210 GitHub stars

03Automated generation of calibration reports and prioritized action plans

04Eval performance reviews to identify gaps in test coverage based on production data

05Agency promotion decision framework to safely increase AI autonomy levels

06Quick health checks for monitoring quality trends, override rates, and user signals

사용 사례

01Determining if an AI feature is stable enough to move from human-in-the-loop to autonomous operation

02Conducting monthly eval audits to ensure test suites evolve alongside actual user behavior

03Analyzing user feedback and support tickets to pinpoint and fix systemic AI failures

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add breethomas/pm-thought-partner calibrate

For use in Claude.ai and ChatGPT

Download Skill