Automates structured incident response for production outages and service degradations within the Claude Code environment.
The Incident Response skill provides a disciplined, step-by-step framework for handling production failures, ensuring that system containment always precedes deep investigation. By integrating with monitoring, security, and root-cause analysis tools, it guides developers through triage, blast-radius containment, and automated postmortem generation to prevent recurring issues. This skill is essential for teams using Claude Code who need to maintain high availability and professional SRE standards while avoiding the common anti-pattern of debugging during active outages.
주요 기능
01Standardized postmortem documentation with actionable prevention tasks
02Automated severity-based triage (P1-P3) with specific impact indicators
03Integration with Watchdog and Sentinel for health and security verification
0463 GitHub stars
05Containment-first workflow to stabilize production before debugging begins
06Automated incident timeline construction and ADR generation
사용 사례
01Managing a critical production outage (P1) with a strict containment-first protocol
02Generating comprehensive incident reports and postmortems for engineering retrospectives
03Executing automated rollbacks or traffic shifts during post-deploy failures