Acerca de
This skill integrates advanced multimodal capabilities into the Claude Code environment by leveraging Google's Gemini 2.0 and 2.5 models. it provides a unified interface for complex media tasks, allowing developers to transcribe hours of audio, perform OCR on multi-page PDFs, analyze video scenes with temporal accuracy, and generate high-fidelity images directly from text prompts. With support for context windows up to 2 million tokens, it is ideal for building AI-powered features that require deep understanding of diverse media formats and structured data extraction.