Defines and implements Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to establish measurable reliability targets.
The SLO Implementation skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices within Claude Code. It enables developers to define critical Service Level Indicators, set realistic Service Level Objectives, and calculate error budgets to balance service stability with development velocity. With built-in Prometheus recording rules, Grafana dashboard structures, and multi-window burn rate alerting patterns, this skill streamlines the process of measuring user-perceived reliability and automating responses to service degradation, ensuring your infrastructure meets business requirements without sacrificing innovation.
主要功能
01Pre-structured Grafana dashboard queries for real-time observability
02Prometheus recording rules for automated availability and latency metrics
03Multi-window burn rate alerting patterns to minimize false positives
04Error budget calculation formulas and management policy templates
052 GitHub stars
06Standardized SLI/SLO/SLA hierarchy definition and documentation
使用场景
01Implementing SLO-based alerting to replace noisy threshold-based alerts
02Managing development velocity by tracking remaining error budgets
03Establishing internal reliability targets for production microservices