Acerca de
This skill provides a comprehensive framework for establishing robust observability in distributed systems, focusing on the four pillars of metrics, logs, traces, and events. It offers specialized guidance on Prometheus architecture, PromQL query optimization, and Grafana visualization, while facilitating the implementation of Service Level Objectives (SLOs) and Error Budgets. Whether setting up Kubernetes service discovery, configuring Alertmanager for incident response, or building custom exporters, it ensures systems remain reliable through advanced time-series analysis and operational excellence patterns.