About
This skill provides a specialized workflow for fine-tuning reward scaling in GPU-native PPO training environments, specifically addressing the 'phantom MaxDD' issue. It helps developers correct scenarios where aggressive scaling causes simulated equity to fluctuate wildly, leading to misleading validation results despite high model performance. By recalibrating the reward scale and adjusting model gating thresholds, this skill ensures that metrics like Maximum Drawdown accurately reflect training stability and reward volatility, allowing for more reliable model selection and evaluation in financial contexts.