소개
This skill empowers Claude to manage complex ML infrastructure using SkyPilot, allowing for seamless deployment across 20+ cloud providers including AWS, GCP, and Azure. It provides specific patterns for finding the cheapest GPU resources, managing distributed multi-node training, and implementing fault-tolerant managed jobs with automatic spot instance recovery. This is ideal for AI researchers and engineers who need to maximize compute efficiency, avoid vendor lock-in, and automate the lifecycle of high-performance computing clusters.