소개
This skill equips Claude with specialized expertise for handling end-to-end reinforcement learning tasks using the Stable Baselines3 (SB3) library. It provides domain-specific guidance for selecting optimal algorithms like PPO, SAC, or DQN, building robust custom Gymnasium environments, and implementing complex training callbacks. Whether you are setting up a new RL project from scratch or optimizing existing training pipelines for sample efficiency, this skill ensures best practices for PyTorch-based agent development, vectorized environment parallelization, and model persistence.