01Distributed multi-node training support with gang scheduling
02Managed spot instance execution with automatic preemption recovery
03Production-grade model serving with autoscaling via Sky Serve
04Automatic cost-based cloud and region selection for GPU workloads
053,983 GitHub stars
06Unified orchestration for 20+ cloud providers and Kubernetes