概要
This skill empowers SREs and DevOps teams by automating the setup of comprehensive alerting rules across various metrics like latency, error rates, and resource utilization. It guides users through the process of defining intelligent thresholds, configuring escalation policies, and generating actionable runbooks, ensuring high system availability while minimizing alert fatigue through statistical analysis and historical data alignment.