소개
Establish measurable reliability targets using SLIs, SLOs, and error budgets to balance innovation velocity with system stability. This skill provides a complete framework for defining critical metrics, setting realistic targets based on user expectations, and implementing automated alerting using Prometheus and Grafana. It helps teams adopt Site Reliability Engineering (SRE) practices by providing standardized formulas for error budget calculation, burn rate monitoring, and structured review processes to ensure service performance aligns with business goals.