01Speculative RL with EAGLE decoding to boost rollout throughput by 25%+
02Unified FP8 and INT4 quantization-aware training (QAT) for massive MoE models
03Zero-copy weight synchronization via CUDA IPC and partial rollout recycling
04Rollout Routing Replay (R3) for bit-wise expert alignment between inference and training
05Deep integration with SGLang, Megatron-LM, and FlashAttention-3
063,983 GitHub stars