关于
This skill provides a comprehensive framework for establishing and monitoring service reliability targets. It helps developers and SREs define meaningful SLIs for availability and latency, set realistic SLO targets, and calculate error budgets to balance innovation with stability. With ready-to-use Prometheus recording rules, multi-window alerting strategies, and Grafana dashboard templates, it automates the technical heavy lifting of reliability engineering, allowing teams to make data-driven decisions about feature velocity and infrastructure investments based on real-world service performance.