Manages technology incidents from detection through resolution with automated triage, timeline reconstruction, and post-incident analysis.
The Incident Commander skill provides a battle-tested framework for SRE and DevOps teams to handle service outages and degradations with professional precision. It automates the most time-consuming aspects of incident management, including severity classification (SEV1-SEV4), chronological timeline generation from disparate logs, and the creation of detailed Post-Incident Reviews (PIRs) using RCA frameworks like the 5 Whys. By providing pre-built communication templates for stakeholders and dynamic runbook generation, it enables engineers to focus on technical resolution while maintaining high standards of transparency and organizational process consistency.
Key Features
01Dynamic runbook generation from identified incident patterns
02Structured Post-Incident Review (PIR) generation with Root Cause Analysis
03Chronological timeline reconstruction from multi-source logs and events
04Automated severity classification based on impact and urgency metrics
059,958 GitHub stars
06Pre-built communication templates for stakeholders and customer status updates
Use Cases
01Reconstructing complex event narratives for post-mortem analysis following a system failure
02Rapidly triaging a high-severity production outage and establishing a war room response
03Generating professional executive summaries and customer-facing status updates during active incidents