Provides expert guidance and implementation patterns for training large-scale Mixture-of-Experts (MoE) models using enterprise-grade Reinforcement Learning.
Miles is a high-performance, production-ready fork of the slime framework designed for post-training massive models like DeepSeek V3 and Qwen3-MoE. It optimizes the reinforcement learning pipeline by introducing low-precision FP8/INT4 training, bit-wise identical train-inference alignment through Rollout Routing Replay (R3), and speculative RL to boost throughput by over 25%. This skill enables developers to manage complex distributed training configurations, handle quantization-aware training (QAT), and ensure stability when scaling to 1TB+ model sizes.
주요 기능
01Speculative RL workflows using EAGLE for significant rollout speedups
020 GitHub stars
03Low-precision optimization with unified FP8 and INT4 Quantization-Aware Training
04Advanced train-inference alignment using TIS/MIS and kernel-level optimizations
05Large-scale MoE training support for models like DeepSeek V3 and Qwen3-MoE
06Bit-wise expert alignment via Rollout Routing Replay (R3) technology
사용 사례
01Training 1TB+ MoE models on distributed high-end GPU clusters like H100/H200
02Accelerating RL rollout throughput with speculative decoding and online SFT
03Implementing quantization-aware training to fit massive models on limited VRAM