소개
This skill provides a robust framework for Site Reliability Engineering (SRE) practices, guiding users through the strategic definition and technical implementation of Service Level Indicators (SLIs) and Service Level Objectives (SLOs). It offers standardized Prometheus recording rules and multi-window burn rate alerting configurations to track availability, latency, and durability in real-time. By establishing clear error budgets and automated policy-driven responses to budget exhaustion, the skill helps engineering teams transition from reactive firefighting to data-driven decision-making regarding feature releases and system stability.