Acerca de
This skill provides a specialized framework for tuning Reinforcement Learning (RL) agents used in financial trading environments, specifically targeting the Alpaca trading platform. It implements a risk-aware composite reward structure that addresses common pitfalls like HOLD bias and reward hacking through overtrading. By rebalancing weights toward P&L-driven objectives and implementing linear gradient clamping instead of saturating activation functions, this skill ensures more robust model convergence and realistic trading behavior for hourly market horizons.