Diagnoses and resolves Amazon EKS cluster issues, including pod failures, node health, and networking problems using AWS best practices.
This skill provides a comprehensive diagnostic framework for Amazon EKS (Elastic Kubernetes Service), enabling developers to rapidly identify and fix common infrastructure and workload issues. It covers critical scenarios like CrashLoopBackOff, Pending pods, Node NotReady states, IAM/IRSA permission errors, and VPC CNI networking bottlenecks. By providing pre-configured diagnostic workflows and essential kubectl/AWS CLI commands, it streamlines the debugging process for clusters ranging from small development environments to high-scale production systems using tools like Karpenter and EKS managed node groups.
主要功能
01Networking and VPC CNI troubleshooting for IP address exhaustion and DNS issues
020 GitHub stars
03Node health analysis for NotReady states and resource pressure resolution
04Performance monitoring for resource contention and cluster scaling issues
05IAM and IRSA permission verification for service account security audits
06Deep diagnostic workflows for pod failures like CrashLoopBackOff and OOMKilled
使用场景
01Debugging application deployment failures and container crashes in an Amazon EKS environment
02Investigating node scaling failures and resource allocation issues in production clusters
03Resolving complex networking or service connectivity issues within a VPC architecture