01Account-aware drawdown penalty optimization
02Refined reward weighting to balance P&L and exploration
03Standardized validation intervals for improved training visibility
04Linear learning rate warmup for early training stabilization
050 GitHub stars
06Sequential LR scheduling combining linear and cosine annealing