About
This skill provides Claude with the specialized domain knowledge required to build and maintain production-grade reliability systems. It includes standardized Prometheus query patterns for the Four Golden Signals—Latency, Traffic, Errors, and Saturation—alongside implementation logic for Service Level Indicators (SLIs) and Objectives (SLOs). It empowers developers to configure sophisticated alerting rules, design effective Grafana dashboards, and instrument applications using OpenTelemetry for distributed tracing and structured logging. By utilizing the RED and USE methodologies, this skill ensures that observability stacks are symptom-oriented and actionable rather than just noise-generating.