概要
This skill empowers Claude to act as an expert SRE or DevOps engineer by providing a structured framework for Kubernetes troubleshooting. It includes specialized Python diagnostic scripts to assess cluster and namespace health, comprehensive playbooks for common errors like CrashLoopBackOff or OOMKilled, and detailed incident response protocols. Whether you are debugging networking latency, storage failures, or Helm deployment issues, this skill provides the evidence-based diagnostic patterns and remediation steps needed to restore system stability and optimize performance.