Can this skill be used for routine maintenance?

No, this skill is specifically designed for unplanned incidents, outages, and performance regressions where the cause is not immediately obvious.

Does it integrate with other Arsyn debugging tools?

Yes, it is designed to work alongside arsyn:debugging for deep code tracing and arsyn:verifying to confirm that a fix has successfully returned metrics to baseline.

How does the Runbook skill improve incident response?

It enforces a structured protocol that prioritizes evidence gathering over guesswork, ensuring that engineers diagnose the true root cause before applying potentially harmful fixes.

What are the five phases of the Runbook process?

The process consists of Intake (symptom capture), Triage (severity assessment), Investigation (evidence gathering), Resolution (applying the fix), and Documentation (post-incident reporting).

Incident Investigation Runbook

Name: Incident Investigation Runbook
Author: renathoaz

byrenathoaz

•

Analíticas y Monitorización

Provides a disciplined, evidence-based framework for investigating and resolving production incidents and service outages.

The Runbook skill transforms AI agents into disciplined engineers by enforcing a structured five-phase incident response process: intake, triage, investigation, resolution, and documentation. It prevents the common pitfall of 'guessing' fixes by mandating evidence collection, timeline mapping, and systematic hypothesis testing. Whether dealing with service outages, error spikes, or performance degradation, this skill ensures that root causes are identified and verified before any production changes are applied, leading to more stable systems and better documentation.

Características Principales

01Structured 5-phase incident response workflow (Intake to Document)

02Evidence-based hypothesis testing to prevent premature fixes

03Automated triage logic for severity and blast radius assessment

042 GitHub stars

05Post-incident report generation for future knowledge sharing

06Strategic mitigation options including rollbacks and scaling

Casos de Uso

01Generating comprehensive post-mortem documentation after critical incidents

02Investigating mysterious error rate spikes or regression patterns

03Resolving production service outages and performance degradation

Características Principales

01Structured 5-phase incident response workflow (Intake to Document)

02Evidence-based hypothesis testing to prevent premature fixes

03Automated triage logic for severity and blast radius assessment

042 GitHub stars

05Post-incident report generation for future knowledge sharing

06Strategic mitigation options including rollbacks and scaling

Casos de Uso

01Generating comprehensive post-mortem documentation after critical incidents

02Investigating mysterious error rate spikes or regression patterns

03Resolving production service outages and performance degradation