How does LR warmup help in this skill?

It prevents large updates from destabilizing the neural network during the first 5% of training by linearly scaling the learning rate from 10% to 100% of its target value.

What happens if the learning rate decays to zero?

Decaying to zero can cause training to stall; this skill maintains a minimum of 1% of the initial learning rate to allow for continued fine-tuning at the end of the run.

Why increase the validation frequency?

Standardizing validation intervals ensures you get at least 10 data points per run, providing a clearer view of the learning curve and model dynamics compared to infrequent checks.

When should I adjust the drawdown penalty weights?

If your model is too conservative (low reward magnitude), reducing the drawdown penalty is recommended, especially if drawdown is already included in the model's observations.

PPO RL Training Optimizer

Name: PPO RL Training Optimizer
Author: smith6jt-cop

bysmith6jt-cop

0•

데이터 과학 및 ML

Optimizes reinforcement learning training stability and monitoring through learning rate warmups, validation scheduling, and reward weight tuning.

This skill provides a verified workflow for enhancing the stability and visibility of GPU-native PPO (Proximal Policy Optimization) training runs. By implementing a linear learning rate warmup to prevent early divergence, increasing validation intervals for better monitoring, and fine-tuning reward weights to prevent overly conservative agent behavior, it ensures more robust and interpretable model development. It is particularly useful for complex training environments where model performance may plateau or become risk-averse due to high drawdown penalties or insufficient validation data.

주요 기능

01Account-aware drawdown penalty optimization

02Refined reward weighting to balance P&L and exploration

03Standardized validation intervals for improved training visibility

04Linear learning rate warmup for early training stabilization

050 GitHub stars

06Sequential LR scheduling combining linear and cosine annealing

사용 사례

01Stabilizing RL models that show instability or divergence in early training epochs

02Increasing monitoring frequency for long-running GPU-native training sessions

03Calibrating trading agents that exhibit overly conservative or passive behavior

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add smith6jt-cop/skills_registry training-improvements-v245

For use in Claude.ai and ChatGPT

Download Skill