GRPO RLHF Alignment - Claude Code Skill for AI Training