Designs and optimizes production-grade observability strategies featuring SLI/SLO frameworks, alerting logic, and comprehensive monitoring dashboards.
The Observability Designer skill empowers engineers to build robust, production-ready monitoring systems by integrating the three pillars of observability—metrics, logs, and traces. It provides automated tools to define service level objectives (SLOs), optimize alert routing to prevent fatigue, and generate high-fidelity dashboard configurations using frameworks like RED and USE. Whether you're scaling microservices on Kubernetes or managing cloud-native applications, this skill ensures deep system visibility and proactive incident detection through structured SLI frameworks and actionable runbooks.
Características Principales
01Distributed tracing and structured logging strategy development
02Smart alert optimization to reduce noise and improve incident actionability
03High-fidelity dashboard generation for Prometheus and Grafana based on Golden Signals
040 GitHub stars
05Automated SLI/SLO framework design with error budget and burn rate calculation
06Detailed runbook generation for standardized incident response
Casos de Uso
01Auditing and refactoring existing alert configurations to eliminate alert fatigue
02Designing hierarchical observability dashboards for SRE, Developer, and Executive personas
03Defining reliability targets and error budgets for a new microservice launch