Acerca de
This skill provides a unified interface for leveraging Google Gemini 2.0 and 2.5 models to analyze and generate multimedia content. It enables Claude to perform complex tasks such as transcribing long-form audio (up to 9.5 hours), detecting objects in videos, extracting structured data from multi-page PDFs, and generating high-quality images from text prompts. By supporting large context windows of up to 2M tokens and providing specialized scripts for media optimization, this tool is essential for developers building multimodal AI features or requiring deep analysis of non-text assets within their development workflow.