How does this skill help manage error budgets?

It provides mathematical formulas and Prometheus rules to track consumed versus remaining reliability budgets, helping teams decide when to focus on stability versus feature velocity.

What is the difference between SLIs and SLOs in this skill?

SLIs (Service Level Indicators) are the actual measurements of service performance (like latency), while SLOs (Service Level Objectives) are the specific targets set for those measurements (like 99.9% success).

What are multi-window burn rate alerts?

These are advanced alerting patterns included in the skill that check multiple time windows simultaneously to detect rapid budget depletion while ignoring minor, short-lived spikes.

Can I use this with Prometheus and Grafana?

Yes, the skill includes specific PromQL queries, recording rules, and dashboard structures designed specifically for Prometheus and Grafana observability stacks.

SLO Implementation

Name: SLO Implementation
Author: EngineerWithAI

byEngineerWithAI

0•

Analíticas y Monitorización

Defines and implements Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to measure and maintain service reliability.

This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, helping teams balance innovation velocity with system reliability. It enables the creation of precise SLIs for availability and latency, establishes realistic SLO targets based on user expectations, and automates error budget calculations. By integrating Prometheus recording and alerting rules, the skill provides actionable insights into service health and burn rates, ensuring teams can proactively manage reliability through data-driven decisions and automated alerting strategies.

Características Principales

01Multi-window burn rate alerting to minimize notification fatigue

02Standardized SLI definitions for availability, latency, and durability

03Pre-configured Grafana dashboard structures for reliability visualization

04Automated error budget calculations and policy enforcement patterns

05Prometheus recording rules for streamlined metrics tracking

060 GitHub stars

Casos de Uso

01Reducing alert noise through sophisticated multi-window burn rate monitoring

02Establishing internal reliability targets for production microservices

03Implementing SRE-driven error budget policies to prioritize stability over new features

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add engineerwithai/engineerwith-agents slo-implementation

For use in Claude.ai and ChatGPT

Download Skill