Which training backends are compatible with this skill?

The skill supports multiple backends including PyTorch FSDP, FSDP2, and Megatron-LM for training, along with vLLM and SGLang for rollout generation.

Can I use verl for Vision-Language Models (VLMs)?

Yes, verl supports RL training for Vision-Language Models, and this skill includes configurations for enabling vision-specific rollout engines.

Does this skill support GRPO for reasoning models?

Yes, it provides specific workflows for Group Relative Policy Optimization (GRPO), which is ideal for training critic-free reasoning models on math and logic tasks.

What is verl and why should I use it for LLM training?

verl is an open-source RL training library from ByteDance that implements the HybridFlow framework. It is designed for production-ready, large-scale RL training, supporting models up to 671B parameters with flexible compute backends.

How does verl handle memory issues during rollout?

The skill provides solutions for OOM errors, such as reducing log_prob micro-batch sizes, enabling gradient checkpointing, and using FSDP2 with CPU offloading.

verl LLM Reinforcement Learning

Name: verl LLM Reinforcement Learning
Author: Orchestra-Research

byOrchestra-Research

•

3,983

•

データサイエンスとML

Trains large language models using advanced reinforcement learning algorithms like GRPO and PPO with the production-ready verl framework.

verl (Volcano Engine Reinforcement Learning) is a flexible and efficient library designed for large-scale LLM post-training, powering models that achieve O1-level performance. This skill provides Claude with domain-specific guidance for implementing RLHF, GRPO, and PPO, enabling seamless scaling up to 671B parameters. It allows developers to swap between backends like FSDP, Megatron-LM, and vLLM, making it an essential tool for AI researchers and engineers building reasoning-heavy models, vision-language agents, or specialized math models using the HybridFlow framework.

主な機能

01Scalable distributed training configurations for models up to 671B parameters

02Specialized workflows for math reasoning and vision-language model training

03Support for advanced RL algorithms including GRPO, PPO, RLOO, and REINFORCE++

043,983 GitHub stars

05Flexible backend integration with FSDP, Megatron-LM, vLLM, and SGLang

06Support for multi-turn rollout and agentic tool-calling reinforcement learning

ユースケース

01Scaling PPO training with a separate critic model using Generalized Advantage Estimation

02Implementing GRPO for training reasoning models on math benchmarks like GSM8K or MATH

03Post-training large-scale vision-language models with specialized rollout engines

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add orchestra-research/ai-research-skills verl

For use in Claude.ai and ChatGPT

Download Skill