01FP8 Precision Training optimized for NVIDIA H100/Hopper
02Sequence and Context Parallelism for long-context windows
03Advanced 3D Parallelism (Tensor, Pipeline, and Data)
04Mixture of Experts (MoE) with Expert Parallelism support
05Automated performance tuning for maximum Model FLOP Utilization (MFU)
063,983 GitHub stars