The SRE Engineer skill empowers Claude to act as a senior Site Reliability Engineer, focusing on the critical balance between feature velocity and system stability. It provides specialized logic for defining quantitative SLIs and SLOs, managing error budgets, and implementing 'golden signal' monitoring (latency, traffic, errors, and saturation). By leveraging this skill, developers can automate repetitive operational toil, design chaos engineering experiments to test system resilience, and establish professional incident management workflows including blameless postmortems and actionable runbooks.
Características Principales
01Chaos engineering experiment design and resilience testing
02Quantitative SLI/SLO definition and error budget calculation
037 GitHub stars
04Toil reduction through targeted automation and scripting
05Golden signal monitoring and alerting configuration for observability
06Blameless postmortem generation and incident response planning