About
This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, focusing on the hierarchy of SLAs, SLOs, and SLIs. It enables developers and SREs to implement measurable reliability targets using Prometheus recording rules, multi-window burn rate alerts, and formal error budget policies. By providing structured templates and implementation patterns for availability and latency, it helps teams balance innovation velocity with system stability, facilitating data-driven decisions on when to prioritize reliability over new features.