Which RL algorithms come pre-built in torchforge?

The library includes built-in implementations for several modern RL algorithms, including GRPO, DAPO, CISPO, GSPO, and SAPO.

Does this skill support multi-node training?

Yes, torchforge is built for scale, supporting multi-GPU and multi-node setups through integration with Monarch and TorchTitan for FSDP and model parallelism.

What is the main advantage of using torchforge?

It separates RL algorithm logic from distributed infrastructure, allowing researchers to write clean code for loss functions without managing GPU orchestration or manual weight synchronization.

What are the hardware requirements for GRPO training?

Standard GRPO training typically requires at least 3 GPUs: one for the trainer, one for the reference model, and one for the generator/inference service.

torchforge RL Training

Name: torchforge RL Training
Author: Orchestra-Research

byOrchestra-Research

•

3,983

•

データサイエンスとML

Implements PyTorch-native agentic reinforcement learning workflows using Meta's torchforge library for scalable algorithm experimentation.

torchforge is a specialized skill designed for researchers and engineers building advanced Reinforcement Learning (RL) agents. It leverages Meta's torchforge library to cleanly separate RL algorithm logic from underlying distributed infrastructure, enabling rapid experimentation with modern techniques like GRPO, DAPO, and SAPO. By integrating with the Monarch actor system and TorchTitan for model parallelism, it provides a robust framework for training large-scale reasoning models while handling complex orchestration tasks such as weight synchronization and asynchronous inference automatically.

主な機能

01Automated weight synchronization across trainer and generator nodes

02Native support for modern RL loss functions including GRPO, DAPO, and SAPO

03Infrastructure isolation for pure algorithm-focused RL development

04Scalable distributed training via Monarch and TorchTitan integration

05High-performance inference and sampling using vLLM for rapid generation

063,983 GitHub stars

ユースケース

01Implementing and benchmarking custom RL loss functions for agentic behavior

02Training math reasoning models using Group Relative Policy Optimization (GRPO)

03Scaling RL fine-tuning workflows across multi-GPU and multi-node clusters

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add orchestra-research/ai-research-skills torchforge

For use in Claude.ai and ChatGPT

Download Skill