01Elimination of overtrading by removing artificial trading incentives
02Optimized discount factor (Gamma) for hourly trading timeframes
03Risk-aware composite reward rebalancing for trading agents
04Calibrated PPO hyperparameters with 3x entropy coefficient increase
050 GitHub stars
06Gradient preservation via linear P&L clamping up to ±2%