Generates robust SLA, SLO, and SLI monitoring configurations to track system reliability and manage error budgets effectively.
The sla-monitor-generator skill streamlines the implementation of Site Reliability Engineering (SRE) principles by automating the creation of standardized monitoring configurations. It helps developers and platform engineers define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical metrics such as API availability, latency percentiles, and error rates. By generating ready-to-use YAML definitions and Prometheus AlertManager rules, the skill ensures that teams can proactively track their error budgets and receive intelligent alerts based on burn rates, facilitating a data-driven approach to balancing feature velocity with system stability.
主な機能
01Automated SLI/SLO definition generation based on industry standards
02Support for multi-dimensional tracking (availability, latency, error rates)
03Integration of SRE best practices for reliability targeting
04Prometheus AlertManager rule configuration for burn rate alerting
05Customizable error budget management windows
062 GitHub stars
ユースケース
01Implementing error budget alerts to prevent service level agreement breaches
02Establishing reliability tracking for a new production microservice
03Standardizing observability configurations across multiple development teams