Can it help resolve IAM/IRSA permission issues?

Yes, it includes workflows to verify service account annotations, check pod-level AWS credentials, and test specific IAM policies using the AWS CLI from within the cluster.

How do I diagnose networking issues like IP exhaustion?

The skill provides specialized checks for VPC CNI logs and ENI usage to identify when your nodes have run out of available IP addresses and offers remediation steps like prefix delegation.

How does this skill help with Pod failures like CrashLoopBackOff?

It provides specific commands to inspect container exit codes, view previous logs, and check resource limits to identify root causes like application errors or OOM kills.

Does it support troubleshooting for Karpenter nodes?

Yes, it includes specific diagnostics for checking Karpenter logs, provisioner configurations, and events to debug why nodes may not be provisioning correctly.

Is this skill up to date with modern AWS practices?

Yes, it is based on 2025 AWS best practices, covering modern features like ephemeral containers for debugging and VPC CNI prefix delegation.

EKS Troubleshooting & Diagnostics

Name: EKS Troubleshooting & Diagnostics
Author: adaptationio

byadaptationio

0•

云基础设施

Diagnoses and resolves Amazon EKS cluster issues, including pod failures, node health, and networking problems using AWS best practices.

This skill provides a comprehensive diagnostic framework for Amazon EKS (Elastic Kubernetes Service), enabling developers to rapidly identify and fix common infrastructure and workload issues. It covers critical scenarios like CrashLoopBackOff, Pending pods, Node NotReady states, IAM/IRSA permission errors, and VPC CNI networking bottlenecks. By providing pre-configured diagnostic workflows and essential kubectl/AWS CLI commands, it streamlines the debugging process for clusters ranging from small development environments to high-scale production systems using tools like Karpenter and EKS managed node groups.

主要功能

01Networking and VPC CNI troubleshooting for IP address exhaustion and DNS issues

020 GitHub stars

03Node health analysis for NotReady states and resource pressure resolution

04Performance monitoring for resource contention and cluster scaling issues

05IAM and IRSA permission verification for service account security audits

06Deep diagnostic workflows for pod failures like CrashLoopBackOff and OOMKilled

使用场景

01Debugging application deployment failures and container crashes in an Amazon EKS environment

02Investigating node scaling failures and resource allocation issues in production clusters

03Resolving complex networking or service connectivity issues within a VPC architecture

主要功能

01Networking and VPC CNI troubleshooting for IP address exhaustion and DNS issues

020 GitHub stars

03Node health analysis for NotReady states and resource pressure resolution

04Performance monitoring for resource contention and cluster scaling issues

05IAM and IRSA permission verification for service account security audits

06Deep diagnostic workflows for pod failures like CrashLoopBackOff and OOMKilled

使用场景

01Debugging application deployment failures and container crashes in an Amazon EKS environment

02Investigating node scaling failures and resource allocation issues in production clusters

03Resolving complex networking or service connectivity issues within a VPC architecture