Does it support GitOps workflows?

Yes, it includes specific commands for Flux CD to check the status of and reconcile sources, kustomizations, and helmreleases across namespaces.

Can this skill fix Kubernetes issues automatically?

The skill focuses on read-only investigation and root cause analysis. It gathers data and applies the 5 Whys methodology to suggest the best remediation steps for the user to approve.

What is the 5 Whys approach in this context?

It is a critical analysis methodology that prevents stopping at surface symptoms (like a timeout) and instead pushes the investigation until an actionable root cause, such as a YAML type error, is identified.

How does this skill handle different Kubernetes clusters?

It uses specific Kubeconfig environments for dev, integration, and live clusters, ensuring all kubectl commands are executed against the correct context after user confirmation.

How does it handle unfamiliar services or third-party charts?

It can spawn a specialized sub-agent to research official documentation for services like Cert-manager, Longhorn, or Grafana to identify common failure modes and health indicators.

Kubernetes SRE & Debugging

Name: Kubernetes SRE & Debugging
Author: ionfury

byionfury

•

デプロイメントとDevOps

Diagnoses and resolves Kubernetes incidents using a structured 5-Whys methodology and multi-cluster investigation patterns.

The Kubernetes SRE skill empowers Claude to act as a Site Reliability Engineer, providing a systematic approach to investigating pod failures, service degradations, and deployment issues. It enforces rigorous root cause analysis using the 5 Whys principle and ensures safe, read-only data collection across multiple environments including dev, integration, and live clusters. From debugging CrashLoopBackOff states and OOMKills to reconciling Flux GitOps resources, this skill provides the context-aware commands and documentation-lookup strategies needed to move beyond symptoms and fix the underlying infrastructure problems.

主な機能

01Flux GitOps reconciliation and status tracking for Helm and Kustomizations

02Structured 5 Whys root cause analysis methodology

0322 GitHub stars

04Automated investigation phases from triage to remediation

05Multi-cluster management with scoped Kubeconfigs (Dev, Integration, Live)

06Integrated documentation research for unfamiliar services and Helm charts

ユースケース

01Root cause analysis for service unreachability and resource-related crashes

02Investigating pods stuck in CrashLoopBackOff, ImagePullBackOff, or Pending states

03Troubleshooting failed Flux GitOps kustomizations and helmreleases

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add ionfury/homelab k8s-sre

For use in Claude.ai and ChatGPT

Download Skill