关于
This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, focusing on the definition and implementation of SLIs, SLOs, and error budgets. It helps development teams balance innovation velocity with system stability by providing standardized Prometheus recording rules, multi-window burn rate alerts, and Grafana dashboard structures. Use this skill to establish measurable reliability targets, implement automated alerting based on error budget consumption, and provide clear visibility into service performance for both technical and business stakeholders.