What metrics can I track with this skill?

You can track Availability, Latency, and Durability SLIs using the provided PromQL query templates and recording rules.

Can I use this for non-Prometheus systems?

While the logic and definitions are universal, the specific code examples and implementation steps are optimized for Prometheus and Grafana stacks.

How does it help manage error budgets?

It provides formulas to calculate remaining budgets and suggests specific policies (like feature freezes) based on how much budget has been consumed.

Does this skill include alerting configurations?

Yes, it includes advanced Prometheus alerting rules for fast and slow error budget burns using multi-window logic to minimize false positives.

SRE SLO Implementation

Name: SRE SLO Implementation
Author: JantonioFC

byJantonioFC

•

분석 및 모니터링

Implements and manages service reliability targets using SLIs, SLOs, and error budgets for robust observability and performance tracking.

The SLO Implementation skill provides a comprehensive framework for defining, measuring, and alerting on service reliability. It enables developers and SREs to establish Service Level Indicators (SLIs) for availability, latency, and durability, set realistic Service Level Objectives (SLOs), and manage error budgets effectively. By integrating Prometheus recording rules and Grafana dashboard structures, this skill helps teams balance innovation velocity with operational stability through data-driven reliability management and automated alerting policies.

주요 기능

012 GitHub stars

02Standardized SLO review processes and reporting templates

03Automated SLI definitions for availability, latency, and durability

04Prometheus recording rules and multi-window burn rate alerts

05Error budget calculation formulas and policy enforcement logic

06Grafana dashboard structures for real-time reliability visualization

사용 사례

01Implementing SRE practices to balance feature velocity and stability

02Configuring sophisticated alerting to reduce monitoring noise and pager fatigue

03Establishing reliability targets for production-grade microservices

주요 기능

012 GitHub stars

02Standardized SLO review processes and reporting templates

03Automated SLI definitions for availability, latency, and durability

04Prometheus recording rules and multi-window burn rate alerts

05Error budget calculation formulas and policy enforcement logic

06Grafana dashboard structures for real-time reliability visualization

사용 사례

01Implementing SRE practices to balance feature velocity and stability

02Configuring sophisticated alerting to reduce monitoring noise and pager fatigue

03Establishing reliability targets for production-grade microservices