Can this skill help define Service Level Objectives (SLOs)?

Yes, it provides specific guidance on defining SLIs/SLOs and setting up error budget burn rate thresholds rather than raw metric thresholds for alerting.

Which observability stacks are supported by this skill?

The skill supports a wide range of tools including ELK/OpenSearch, Prometheus, Grafana, Datadog, New Relic, and vendor-neutral OpenTelemetry configurations.

How does it handle distributed tracing?

It focuses on the Software Engineer by RN competency matrix, ensuring every log entry carries trace_id and span_id for seamless correlation across metrics and traces.

Is it compatible with OpenTelemetry?

Yes, OpenTelemetry is treated as the foundational standard, allowing you to choose and switch between different backends independently.

Observability & Monitoring Engineering

Name: Observability & Monitoring Engineering
Author: rnavarych

byrnavarych

•

Analytics & Monitoring

Implements comprehensive observability architectures including structured logging, distributed tracing, and symptom-based alerting systems.

This skill enables engineers to architect and deploy production-grade observability stacks using modern standards like OpenTelemetry. It facilitates the implementation of the three pillars of observability—logs, metrics, and traces—with a specific focus on cross-pillar correlation. By moving teams from reactive debugging to proactive SLO-based management, it supports major platforms like ELK, Prometheus, Grafana, and Datadog to ensure high system reliability and efficient error budget management.

Key Features

01OpenTelemetry (OTel) integration and collector configuration

02RED and USE metrics methodology implementation

03Structured logging with mandatory trace and span correlation

04SLI/SLO/SLA definition and error budget tracking

05Dashboard design for Grafana, Datadog, and ELK/OpenSearch

0611 GitHub stars

Use Cases

01Implementing distributed tracing across microservices to identify latency bottlenecks

02Setting up symptom-based alerting rules and SLI dashboards for production reliability

03Designing a vendor-neutral observability backend using OpenTelemetry

Key Features

01OpenTelemetry (OTel) integration and collector configuration

02RED and USE metrics methodology implementation

03Structured logging with mandatory trace and span correlation

04SLI/SLO/SLA definition and error budget tracking

05Dashboard design for Grafana, Datadog, and ELK/OpenSearch

0611 GitHub stars

Use Cases

01Implementing distributed tracing across microservices to identify latency bottlenecks

02Setting up symptom-based alerting rules and SLI dashboards for production reliability

03Designing a vendor-neutral observability backend using OpenTelemetry