Can I use this for Grafana dashboards?

Absolutely. It provides a recommended Grafana dashboard structure and specific queries for visualizing compliance, burn rates, and remaining error budgets.

How does this skill help with SRE?

It provides a structured framework for defining SLIs and SLOs, helping teams adopt Site Reliability Engineering practices by balancing service uptime with development speed.

What is an error budget and how is it used?

An error budget is the calculated tolerance for failure (1 minus your SLO). This skill helps you track that budget to decide when to freeze features or focus on reliability.

Does it support Prometheus and PromQL?

Yes, the skill includes specific PromQL queries for recording rules and advanced alerting rules for measuring availability and latency thresholds.

SLO & Error Budget Implementation

Name: SLO & Error Budget Implementation
Author: GaitanS

byGaitanS

0•

Analytics & Monitoring

Defines and implements Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure system reliability and manage error budgets.

This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, allowing teams to measure service performance through specific SLIs like availability, latency, and durability. It guides the setup of internal reliability targets (SLOs), calculates error budgets to balance innovation with stability, and provides ready-to-use Prometheus recording rules and alerting configurations. By implementing multi-window burn rate alerts and standardized Grafana dashboard structures, it helps developers maintain high service health while providing clear data for operational decision-making.

Key Features

01Error budget calculation formulas and management policies

020 GitHub stars

03Pre-configured Prometheus recording and alerting rules for availability and latency

04Visual dashboard structures for Grafana monitoring

05Standardized SLI/SLO/SLA hierarchy mapping and definitions

06Multi-window burn rate alert templates to reduce notification fatigue

Use Cases

01Configuring intelligent, proactive alerts based on error budget consumption

02Establishing measurable reliability targets for production-grade microservices

03Implementing SRE practices to balance feature velocity with system uptime

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add gaitans/ai-mancare slo-implementation

For use in Claude.ai and ChatGPT

Download Skill