Acerca de
This skill implements a rigorous Site Reliability Engineering (SRE) framework for managing production outages and service degradations. It guides teams through critical phases including incident declaration with severity mapping, coordinated triage using the 5 Whys approach, and systematic mitigation strategies. By enforcing mandatory documentation and verification periods, it ensures that SEV1-SEV3 incidents are handled with consistency, reducing recovery time and preventing premature resolutions while facilitating blameless post-mortems to improve long-term system reliability.