소개
This skill empowers Claude to interact with complex multimedia assets by leveraging the Google Gemini API. It provides a unified interface for transcribing long-form audio, performing scene-level video analysis, extracting structured data from multi-page PDFs, and generating high-quality images from text prompts. By supporting massive context windows up to 2M tokens, it allows developers to implement sophisticated AI features that require deep understanding of non-textual data or the creation of visual assets directly within their coding workflow.