소개
This skill equips Claude with the expertise to implement and optimize PufferLib, a framework designed for high-throughput reinforcement learning. It facilitates the creation of custom environments via the PufferEnv API, automates complex vectorization setups for parallel simulation, and provides implementation patterns for optimized PPO and LSTM-based policies. Use this skill to scale training to millions of steps per second, integrate existing frameworks like Gymnasium or PettingZoo, and develop robust multi-agent systems using proven best practices for performance and scalability.