关于
This skill provides a specialized framework for configuring Reinforcement Learning (RL) training sessions, specifically PPO, to maximize throughput on high-end NVIDIA GPUs. It prevents common performance bottlenecks—such as under-utilization of A100/H100 cores and disabled compilation—by implementing a layered configuration strategy that detects hardware tiers before applying training modes. By ensuring optimal parallel environment counts, appropriate minibatch sizes, and enabling torch.compile, it helps developers achieve 10x or greater speedups in training frames per second (FPS).