Can I customize the escalation matrix for my team?

Yes, the templates are designed to be adapted to your specific organizational structure, including custom Slack channels, PagerDuty rotations, and management contact info.

Does this skill provide specific CLI commands for troubleshooting?

Yes, it includes pre-written Kubernetes (kubectl) and PostgreSQL (psql) commands for rapid triage, logging, and scaling during an active incident.

What types of incidents are covered by these templates?

The skill includes templates for general service outages, high latency, traffic surges, and specific database issues like connection pool exhaustion or replication lag.

Is this skill useful for compliance and audits?

Absolutely. Maintaining standardized incident response procedures is a key requirement for frameworks like SOC 2, and these templates help document those professional standards.

How do these runbooks help reduce MTTR (Mean Time To Recovery)?

By providing a structured framework and clear decision trees, teams can reduce cognitive load during a crisis and avoid manual errors, leading to faster service restoration.

Incident Response Runbooks

Name: Incident Response Runbooks
Author: nguyendinhquocx

bynguyendinhquocx

분석 및 모니터링

Generates structured incident response runbooks with step-by-step procedures, escalation matrices, and recovery actions for production environments.

소개

The Incident Response Runbooks skill provides a comprehensive framework for managing system outages and service degradations. It offers production-ready templates that cover the entire incident lifecycle—from initial detection and triage to mitigation, resolution, and post-mortem communication. By standardizing severity levels and providing ready-to-use CLI commands for Kubernetes and database environments, this skill helps engineering teams reduce time-to-resolution (MTTR) and maintain clear communication during high-pressure production events.

주요 기능

Detailed service outage templates with Kubernetes-specific triage and mitigation steps
Automated communication templates for internal updates and stakeholder notifications
0 GitHub stars
Standardized incident severity classifications (SEV1-SEV4) with response time targets
Database-specific runbooks for connection exhaustion, replication lag, and disk space issues
Pre-defined escalation matrices and rollback procedures to minimize business impact

사용 사례

Establishing standard operating procedures (SOPs) for new microservices and cloud infrastructure
Creating '3 AM-ready' on-call documentation that guides engineers through complex recovery tasks
Accelerating real-time incident response by providing instant access to diagnostic commands and mitigation scripts

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add nguyendinhquocx/code-ai incident-runbook-templates

For use in Claude.ai and ChatGPT

Download Skill

GitHub

소개