소개
This skill equips developers and SREs with standardized templates for managing system incidents, from initial detection to final resolution. It provides structured frameworks for handling service outages and database failures, including severity level definitions, escalation matrices, and pre-written CLI commands for Kubernetes and PostgreSQL troubleshooting. By using these templates, teams can establish reliable on-call procedures, ensure consistent communication during high-pressure events, and reduce Mean Time to Resolution (MTTR).