About
This skill equips Claude with specialized knowledge for implementing and optimizing multimodal AI workflows using industry-standard pre-trained models. It provides comprehensive guidance on zero-shot image classification with CLIP, high-fidelity multilingual transcription via Whisper, and sophisticated image synthesis using Stable Diffusion and SDXL. Developers can leverage this skill to select appropriate model sizes, manage GPU VRAM constraints, and apply best practices for embedding-based similarity, translation, and controlled image generation.