概要
This skill provides a unified interface for Claude to interact with Google Gemini's multimodal models, enabling deep analysis of multimedia content within a coding workflow. It allows users to perform high-fidelity audio transcription, analyze videos up to 6 hours long, extract structured data from complex multi-page PDFs, and generate or edit high-quality images. By bridging the gap between raw media files and actionable text-based insights, it is an essential tool for developers building media-intensive applications, automating document workflows, or requiring sophisticated visual and auditory understanding.