소개
This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, specifically focusing on Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets. It helps developers and SREs establish clear reliability targets using Prometheus metrics, create automated alerting based on error budget burn rates, and design observability dashboards. By balancing reliability requirements against innovation goals, it ensures teams maintain high performance while making data-driven decisions about feature deployment and infrastructure stability.