概要
The DeepSpeed Skill provides expert guidance and implementation patterns for the distributed training of massive AI models. It specializes in Zero Redundancy Optimizer (ZeRO) stages, mixed-precision training (FP16/BF16/FP8), and advanced memory management techniques like DeepNVMe for high-performance data transfers between storage and GPU memory. This skill is essential for researchers and engineers looking to scale model training, reduce memory overhead, and maximize hardware utilization across multi-GPU and multi-node clusters.