概要
This skill integrates Google Gemini’s powerful multimodal capabilities into the Claude environment, enabling advanced processing of diverse media types including audio transcription for up to 9.5 hours, video analysis of YouTube URLs, and high-fidelity image generation. It provides a unified interface for extracting structured data from PDFs, performing object detection, and conducting visual question-answering, making it an essential tool for developers needing to bridge the gap between complex multimedia content and text-based AI workflows.