소개
This skill empowers Claude with comprehensive multimodal capabilities by leveraging Google's Gemini API (2.0 and 2.5 series). It enables advanced analysis of audio files up to 9.5 hours, video processing for up to 6 hours, and complex PDF vision extraction for documents up to 1,000 pages. Beyond analysis, it provides text-to-image generation and refinement, making it an all-in-one solution for developers needing to automate media transcription, extract structured data from visual documents, or integrate AI-driven image creation directly into their coding workflows.