Does miles support speculative decoding during RL?

Yes, miles supports EAGLE speculative decoding via SGLang, which typically provides a 25-40% increase in rollout speed by using a draft model to predict tokens.

What is the primary advantage of using miles over slime?

miles is a production-ready fork of slime that adds enterprise-grade features like FP8/INT4 quantization-aware training, speculative RL for higher throughput, and bit-wise train-inference alignment for MoE models.

How does miles handle the memory requirements of 1TB+ models?

It utilizes INT4 Quantization-Aware Training (QAT), which can reduce VRAM requirements by over 3x, allowing massive models to be trained on high-end hardware like H200 clusters more efficiently.

What is Rollout Routing Replay (R3)?

R3 is a feature that records expert routing decisions during inference and replays them exactly during training, ensuring bit-wise consistency and preventing policy collapse in Mixture-of-Expert models.

Enterprise RL Training (miles)

Name: Enterprise RL Training (miles)
Author: Orchestra-Research

byOrchestra-Research

•

3,983

•

데이터 과학 및 ML

Optimizes large-scale Reinforcement Learning training for Mixture-of-Experts models using high-performance quantization and speculative rollout techniques.

miles is an enterprise-grade RL framework designed for the post-training of massive models like DeepSeek V3 and Qwen3-MoE. It addresses critical scaling challenges by providing bit-wise train-inference alignment through Rollout Routing Replay (R3), unified FP8 pipelines, and speculative RL to increase throughput by up to 40%. This skill is essential for engineering teams training 1TB+ models who require production-ready stability, INT4 quantization-aware training to fit large models on limited VRAM, and deep integration with SGLang and Megatron-LM.

주요 기능

01Speculative RL with EAGLE decoding to boost rollout throughput by 25%+

02Unified FP8 and INT4 quantization-aware training (QAT) for massive MoE models

03Zero-copy weight synchronization via CUDA IPC and partial rollout recycling

04Rollout Routing Replay (R3) for bit-wise expert alignment between inference and training

05Deep integration with SGLang, Megatron-LM, and FlashAttention-3

063,983 GitHub stars

사용 사례

01Post-training 1TB+ Mixture-of-Experts (MoE) models like DeepSeek V3 or Qwen3-MoE

02Deploying and fine-tuning large-scale models on limited VRAM using INT4 quantization

03Optimizing RL training throughput using speculative decoding and online MTP training

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add orchestra-research/ai-research-skills miles

For use in Claude.ai and ChatGPT

Download Skill