关于
This skill equips Claude Code with a structured SRE methodology for troubleshooting complex Kubernetes environments. It provides a comprehensive framework for incident triage, multi-source data collection, and temporal correlation of logs, events, and metrics. By following five distinct investigation phases, the skill helps identify the underlying root cause of common issues like CrashLoopBackOff, OOMKilled events, and TLS failures while adhering to safety-first, read-only investigation principles.