How does this integrate with Prometheus and Grafana?

The skill generates ready-to-use Prometheus recording and alerting rules, as well as structural guidance for building Grafana dashboards that visualize error budget consumption.

Can I use this for alerting in production?

Yes, it includes advanced multi-window burn rate alerting logic designed to identify both sudden outages (fast burn) and gradual reliability degradation (slow burn).

What metrics does this skill help me track?

It provides frameworks for tracking Service Level Indicators (SLIs) such as request availability, p95 latency thresholds, and storage durability using PromQL.

What is an error budget policy?

An error budget policy is a defined set of actions (like freezing non-critical changes) taken when a service consumes too much of its reliability budget within a 28-day window.

SLO Implementation & SRE Metrics

Name: SLO Implementation & SRE Metrics
Author: HermeticOrmus

byHermeticOrmus

0•

Analíticas y Monitorización

Defines and implements service reliability targets using SLIs, SLOs, and error budgets to balance innovation with system stability.

This skill provides a robust framework for Site Reliability Engineering (SRE) practices, allowing teams to move beyond simple monitoring to goal-oriented observability. It facilitates the definition of Service Level Indicators (SLIs) for availability, latency, and durability, and helps set realistic Service Level Objectives (SLOs) based on user expectations. By calculating error budgets and generating sophisticated Prometheus alerting rules—including multi-window burn rates—it ensures that reliability is measurable, actionable, and aligned with business objectives, helping teams decide when to prioritize feature work versus stability improvements.

Características Principales

01Implements multi-window burn rate alerts to minimize false positives and alert fatigue

02Standardized templates for SLI definition (Availability, Latency, Durability)

03Calculates error budgets to guide deployment frequency and risk management

040 GitHub stars

05Generates Prometheus recording rules for high-performance metric aggregation

06Provides structured Grafana dashboard layouts for SLO and budget visualization

Casos de Uso

01Optimizing on-call rotations by migrating to SLO-based alerting over raw threshold alerts

02Implementing automated error budget policies to govern release velocity

03Establishing quantifiable reliability targets for production-grade microservices

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add hermeticormus/floreserlife slo-implementation

For use in Claude.ai and ChatGPT

Download Skill