Automates the collection of GPU node health, Kubernetes logs, and API diagnostics for CoreWeave cloud infrastructure troubleshooting.
The CoreWeave Debug Bundle is a specialized Claude Code skill designed to streamline the troubleshooting process for GPU-intensive workloads on CoreWeave infrastructure. It automatically gathers critical diagnostic data—including GPU allocation metrics, NVIDIA device plugin status, failed pod logs, and API connectivity states—into a portable archive for support tickets or internal analysis. This tool is indispensable for DevOps engineers managing inference workloads who need to quickly diagnose why pods are stuck pending, why nodes are underperforming, or why API requests are failing.
Características Principales
01Comprehensive Kubernetes pod status and cluster-level event logging
02Automated diagnostic archive generation (tar.gz) for support tickets
03Real-time GPU resource allocation and device plugin health checks
04CoreWeave API connectivity testing and rate limit header analysis
05Targeted extraction of logs from failed pods and OOMKilled events
062,083 GitHub stars
Casos de Uso
01Preparing comprehensive technical data for CoreWeave support escalations
02Debugging Out-Of-Memory (OOM) failures in large-scale AI inference workloads
03Troubleshooting GPU pods stuck in a 'Pending' state due to quota or allocation issues