Acerca de
This skill provides a comprehensive framework for Site Reliability Engineering (SRE) practices, helping teams balance innovation velocity with system reliability. It enables the creation of precise SLIs for availability and latency, establishes realistic SLO targets based on user expectations, and automates error budget calculations. By integrating Prometheus recording and alerting rules, the skill provides actionable insights into service health and burn rates, ensuring teams can proactively manage reliability through data-driven decisions and automated alerting strategies.