Can these runbooks be customized for my specific tech stack?

Yes, the templates are designed as modular frameworks that can be easily adapted for specific infrastructure like Kubernetes, AWS, or various database engines.

What types of failure scenarios are included?

The skill covers general service outages, database connection pool exhaustion, replication lag, disk space crises, and traffic surges.

How does this improve team coordination?

By providing a standardized escalation matrix and communication templates, it ensures all stakeholders—from engineering managers to customer support—are kept informed automatically.

Does this skill help during a live incident?

Absolutely. It provides immediate access to triage checklists, diagnostic commands, and communication templates to help engineers maintain focus during high-pressure outages.

Incident Runbook Templates

Name: Incident Runbook Templates
Author: drgaciw

bydrgaciw

0•

分析与监控

Generates structured incident response runbooks with standardized procedures, escalation paths, and recovery actions for production environments.

This skill provides production-ready templates and standardized workflows for managing system incidents, from initial detection and triage to mitigation and post-mortem communication. It equips teams with structured response patterns for common failure scenarios like service outages and database issues, ensuring that engineers have clear, actionable steps, CLI commands, and communication templates to restore service quickly even under high-pressure situations. It is an essential tool for SREs and DevOps teams looking to formalize their on-call procedures and minimize Mean Time to Resolution (MTTR).

主要功能

01Database-specific troubleshooting and recovery scripts

02Production-ready service outage mitigation procedures

03Kubernetes-based remediation and rollback strategies

040 GitHub stars

05Standardized internal and external communication templates

06Pre-defined incident severity (SEV1-SEV4) frameworks

使用场景

01Creating automated escalation matrices for team coordination

02Developing on-call documentation for new microservices

03Standardizing triage steps during active production outages

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add drgaciw/academic-compliance-hub-glm incident-runbook-templates

For use in Claude.ai and ChatGPT

主要功能

01Database-specific troubleshooting and recovery scripts

02Production-ready service outage mitigation procedures

03Kubernetes-based remediation and rollback strategies

040 GitHub stars

05Standardized internal and external communication templates

06Pre-defined incident severity (SEV1-SEV4) frameworks

使用场景

01Creating automated escalation matrices for team coordination

02Developing on-call documentation for new microservices

03Standardizing triage steps during active production outages

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add drgaciw/academic-compliance-hub-glm incident-runbook-templates

For use in Claude.ai and ChatGPT