Acerca de
This skill provides domain-specific guidance for fine-tuning Vision-Language Models (VLMs) such as Pixtral, Ministral VL, and Llama 3.2 Vision. It streamlines the implementation of Unsloth's FastVisionModel for 2x faster training, covering critical aspects like vision-specific LoRA configurations, multi-modal dataset preparation using PIL, and specialized SFTTrainer setups. Whether you are building OCR systems or advanced visual reasoning models, this skill ensures Claude Code follows best practices for vision model optimization and inference.