RLOO Policy Optimization - Claude Code Skill for RL