概要
This skill empowers Claude to interact with complex multimedia assets by leveraging the Google Gemini API (including versions 1.5, 2.0, and 2.5). It provides a unified interface for sophisticated tasks such as long-form video analysis, timestamped audio transcription, high-accuracy OCR, and native PDF vision processing. Beyond analysis, it supports image generation and editing, making it an essential tool for developers building AI-driven features or needing to extract structured data from diverse media formats within their terminal environment.