Can I use this for capacity planning?

Yes, it includes utilization tracking formulas and growth projection models to help you calculate exactly when you need to scale your infrastructure based on QPS trends.

Which reliability patterns are included in this skill?

The skill includes implementation logic for Circuit Breakers, Exponential Backoff with Jitter, Token Bucket Rate Limiting, and Bulkhead patterns to prevent cascading failures.

Does it support testing system resilience?

Absolutely. It provides configurations for k6 load testing and fault injection snippets for chaos engineering to test how your system handles latency and service failures.

How does this skill help with SLO management?

It provides standardized templates for defining Service Level Indicators (SLIs) and Objectives (SLOs), including specific Prometheus queries to measure and alert on error budget burn rates.

SRE & System Reliability

Name: SRE & System Reliability
Author: TheBushidoCollective

byTheBushidoCollective

•

분석 및 모니터링

Implements SRE principles and reliability patterns to build scalable, fault-tolerant distributed systems.

This skill equips Claude with Site Reliability Engineering (SRE) expertise, focusing on the practical application of SLOs, SLIs, and Error Budgets. It provides standardized templates for reliability documentation and ready-to-use implementation patterns for critical resilience mechanisms like circuit breakers, exponential backoff, and bulkheads. Whether you are architecting a new distributed system or hardening an existing one, this skill ensures your infrastructure is observable, scalable, and designed for failure.

주요 기능

01Graceful degradation strategies for high-traffic services

0272 GitHub stars

03SLO and SLI definition templates with Prometheus query examples

04Error budget tracking and burn rate alerting logic

05Resilience patterns including Circuit Breakers and Bulkheads

06Capacity planning and k6 load testing configurations

사용 사례

01Automating capacity planning and growth projections for cloud infrastructure

02Hardening distributed systems against cascading failures using reliability patterns

03Designing a comprehensive monitoring and alerting strategy for microservices

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add thebushidocollective/han sre-reliability

For use in Claude.ai and ChatGPT

주요 기능

01Graceful degradation strategies for high-traffic services

0272 GitHub stars

03SLO and SLI definition templates with Prometheus query examples

04Error budget tracking and burn rate alerting logic

05Resilience patterns including Circuit Breakers and Bulkheads

06Capacity planning and k6 load testing configurations

사용 사례

01Automating capacity planning and growth projections for cloud infrastructure

02Hardening distributed systems against cascading failures using reliability patterns

03Designing a comprehensive monitoring and alerting strategy for microservices

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add thebushidocollective/han sre-reliability

For use in Claude.ai and ChatGPT