This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, enabling developers to define, measure, and manage service reliability through Claude. It offers standardized patterns for Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets, complete with Prometheus recording rules, sophisticated alerting logic, and Grafana visualization templates. By codifying the relationship between performance metrics and business requirements, it helps engineering teams make data-driven decisions about when to prioritize feature development versus reliability investments.
Características Principales
01Pre-configured Prometheus recording and alerting rules
0213 GitHub stars
03Standardized SLI/SLO/SLA hierarchy definitions
04Multi-window burn rate alert configurations to reduce noise
05Grafana dashboard structures for reliability visualization
06Automated error budget calculation and policy templates