关于
This skill empowers Claude to act as a reinforcement learning expert, providing standardized implementations for training agents with algorithms like PPO, SAC, and DQN. It streamlines the creation of custom Gymnasium environments, provides robust callback structures for monitoring, and optimizes performance through vectorized environments. Whether you are building a robotics simulation, a game AI, or a financial trading bot, this skill ensures best practices in model persistence, evaluation, and deep RL experimentation using the unified Stable Baselines3 API.