What is the difference between SLIs and SLOs in this skill?

SLIs (Service Level Indicators) are the actual measurements of your service's performance, while SLOs (Service Level Objectives) are the specific target values or ranges those measurements should stay within to satisfy users.

How does the skill help with alert fatigue?

It implements multi-window burn rate alerts. This approach requires both a short-term and long-term window to exceed thresholds before firing, significantly reducing false positives from transient spikes.

Can I use this for non-web services?

Absolutely. While many examples use HTTP metrics, the framework includes durability SLIs for storage systems and general formulas applicable to message queues and background processing.

How do error budgets help my development team?

Error budgets provide a neutral, data-driven mechanism to balance reliability and velocity. If you have budget left, you can move fast; if it's exhausted, the team prioritizes stability over new features.

SLO Implementation & SRE Framework

Name: SLO Implementation & SRE Framework
Author: HermeticOrmus

byHermeticOrmus

0•

Analytics & Monitoring

Defines and implements Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets to ensure system reliability.

This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, focusing on the hierarchy of SLAs, SLOs, and SLIs. It enables developers and SREs to implement measurable reliability targets using Prometheus recording rules, multi-window burn rate alerts, and formal error budget policies. By providing structured templates and implementation patterns for availability and latency, it helps teams balance innovation velocity with system stability, facilitating data-driven decisions on when to prioritize reliability over new features.

Key Features

01Error budget calculation formulas and management policies

02SLI/SLO/SLA hierarchy mapping and definition templates

03PromQL-based measurement for availability, latency, and durability

04Standardized Grafana dashboard structures for reliability visualization

05Multi-window burn rate alerting to reduce monitoring noise

060 GitHub stars

Use Cases

01Establishing reliability targets for production microservices

02Implementing SRE error budget policies to manage development velocity

03Setting up sophisticated Prometheus alerts for rapid reliability degradation

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add hermeticormus/hermetic-academy slo-implementation

For use in Claude.ai and ChatGPT

Download Skill