소개
This skill provides a comprehensive implementation framework for GRPO fine-tuning, a reinforcement learning approach that excels when traditional Supervised Fine-Tuning (SFT) underperforms or training data is limited to fewer than 1,000 examples. It enables Claude to guide developers through creating custom reward functions, configuring specific hyperparameters for VLMs, and managing AWS SageMaker infrastructure requirements. This approach is particularly effective for tasks requiring strict adherence to formats like JSON or high diversity in outputs from vision-based prompts.