소개
This skill empowers developers and SREs to proactively manage system health by automating the definition of alerting rules for latency, error rates, and resource utilization. It streamlines the process of setting intelligent thresholds, configuring routing and escalation paths, and generating actionable runbooks to minimize alert fatigue and improve incident response times across complex infrastructure.