소개
The Training Resilience skill enhances Claude's ability to debug and optimize Reinforcement Learning (RL) training pipelines, specifically focusing on PPO-based trading models. It addresses common pitfalls such as incorrect drawdown calculations (absolute vs. percentage), inappropriate early-stop triggers, and the confusion between PPO reward signals and actual equity curves. By implementing adaptive recovery mechanisms—including automated learning rate reduction and entropy adjustments—it ensures training sessions are resilient to transient volatility and provides a robust framework for developing stable algorithmic trading agents.