关于
This skill provides production-ready templates and standardized workflows for managing system incidents, from initial detection and triage to mitigation and post-mortem communication. It equips teams with structured response patterns for common failure scenarios like service outages and database issues, ensuring that engineers have clear, actionable steps, CLI commands, and communication templates to restore service quickly even under high-pressure situations. It is an essential tool for SREs and DevOps teams looking to formalize their on-call procedures and minimize Mean Time to Resolution (MTTR).