概要
The AI Multimodal Processing skill integrates Google Gemini's advanced capabilities into Claude, allowing for sophisticated analysis of multimedia assets. It provides a unified interface for transcribing hours of audio, performing scene detection on long-form video, extracting structured data from multi-page PDFs, and generating high-quality images from text. This skill is particularly useful for developers who need to automate complex media workflows, perform high-fidelity OCR, or implement pixel-level image segmentation and object detection directly within their development environment.