What is the main advantage of Miles over Slime?

Miles is an enterprise-grade fork focusing on production stability, FP8/INT4 quantization, and bit-wise train-inference alignment for large MoE models.

What hardware is required for Miles?

It is optimized for NVIDIA H100/H200 GPUs to leverage FP8 block scaling and high-performance kernels like FlashAttention-3 and DeepGEMM.

Does Miles support speculative RL?

Yes, it integrates with SGLang and EAGLE to provide a 25-40% speedup in rollout throughput during the reinforcement learning process.

Can I use Miles for models other than MoE?

Yes, it supports dense models like Llama 3 and Gemma, though its specialized features like R3 are specifically optimized for MoE architectures.

Miles RL Training Framework

Name: Miles RL Training Framework
Author: MesferAli

byMesferAli

0•

데이터 과학 및 ML

Provides expert guidance and implementation patterns for training large-scale Mixture-of-Experts (MoE) models using enterprise-grade Reinforcement Learning.

Miles is a high-performance, production-ready fork of the slime framework designed for post-training massive models like DeepSeek V3 and Qwen3-MoE. It optimizes the reinforcement learning pipeline by introducing low-precision FP8/INT4 training, bit-wise identical train-inference alignment through Rollout Routing Replay (R3), and speculative RL to boost throughput by over 25%. This skill enables developers to manage complex distributed training configurations, handle quantization-aware training (QAT), and ensure stability when scaling to 1TB+ model sizes.

주요 기능

01Speculative RL workflows using EAGLE for significant rollout speedups

020 GitHub stars

03Low-precision optimization with unified FP8 and INT4 Quantization-Aware Training

04Advanced train-inference alignment using TIS/MIS and kernel-level optimizations

05Large-scale MoE training support for models like DeepSeek V3 and Qwen3-MoE

06Bit-wise expert alignment via Rollout Routing Replay (R3) technology

사용 사례

01Training 1TB+ MoE models on distributed high-end GPU clusters like H100/H200

02Accelerating RL rollout throughput with speculative decoding and online SFT

03Implementing quantization-aware training to fit massive models on limited VRAM

주요 기능

01Speculative RL workflows using EAGLE for significant rollout speedups

020 GitHub stars

03Low-precision optimization with unified FP8 and INT4 Quantization-Aware Training

04Advanced train-inference alignment using TIS/MIS and kernel-level optimizations

05Large-scale MoE training support for models like DeepSeek V3 and Qwen3-MoE

06Bit-wise expert alignment via Rollout Routing Replay (R3) technology

사용 사례

01Training 1TB+ MoE models on distributed high-end GPU clusters like H100/H200

02Accelerating RL rollout throughput with speculative decoding and online SFT

03Implementing quantization-aware training to fit massive models on limited VRAM