关于
This skill provides a comprehensive framework for implementing Site Reliability Engineering (SRE) practices through the definition of SLIs, SLOs, and error budgets. It guides users through the technical setup of Prometheus recording rules, multi-window burn rate alerts, and Grafana dashboard structures, allowing teams to balance innovation velocity with service stability. By moving away from arbitrary uptime goals toward user-perceived reliability metrics, this skill helps developers implement proactive monitoring and automated error budget policies.