What is the difference between an SLI and an SLO?

An SLI (Service Level Indicator) is a specific measurement of a service's performance, such as latency or success rate. An SLO (Service Level Objective) is the target value or range of values for that SLI that represent acceptable reliability.

Does it include latency monitoring?

Yes, the skill includes pre-defined patterns for tracking latency SLIs, such as the percentage of requests completed under a specific millisecond threshold (e.g., P95 latency).

How does this skill support Prometheus?

It provides specific PromQL templates for recording rules that calculate SLI ratios and SLO compliance, as well as complex alerting rules for multi-window burn rates.

What is an error budget policy?

An error budget policy defines what actions a team should take based on how much of their reliability budget remains, such as freezing new features when the budget is exhausted to focus on stability.

SLO & Reliability Engineering

Name: SLO & Reliability Engineering
Author: gwickman

bygwickman

0•

Analytics & Monitoring

Implements Service Level Objectives (SLOs) and error budgets to balance service reliability with development velocity.

The SLO Implementation skill provides a comprehensive Site Reliability Engineering (SRE) framework for defining Service Level Indicators (SLIs), establishing measurable reliability targets, and managing error budgets. It enables teams to quantify user-perceived performance and make data-driven decisions about feature delivery versus system stability. By providing standardized templates for Prometheus recording rules, multi-window burn rate alerts, and Grafana dashboard structures, this skill helps developers move beyond basic uptime monitoring to sophisticated, business-aligned observability practices.

Key Features

01Standardized SLI/SLO/SLA hierarchy definitions

02Multi-window burn rate alert configurations

03Prometheus recording rule and alerting logic generation

040 GitHub stars

05Error budget calculation and policy templates

06Availability and latency measurement patterns

Use Cases

01Configuring advanced observability alerts to reduce noise

02Establishing reliability targets for microservices

03Implementing SRE practices within DevOps workflows

Key Features

01Standardized SLI/SLO/SLA hierarchy definitions

02Multi-window burn rate alert configurations

03Prometheus recording rule and alerting logic generation

040 GitHub stars

05Error budget calculation and policy templates

06Availability and latency measurement patterns

Use Cases

01Configuring advanced observability alerts to reduce noise

02Establishing reliability targets for microservices

03Implementing SRE practices within DevOps workflows