소개
This skill integrates Google Gemini’s industry-leading multimodal capabilities into Claude Code, allowing users to analyze large-scale media assets including video files up to 6 hours, audio recordings up to 9.5 hours, and complex PDF documents up to 1,000 pages. It provides a unified interface for sophisticated tasks such as automated transcription with speaker identification, object detection, pixel-level segmentation, and high-fidelity image generation. By automating media optimization and structured data extraction, it serves as a comprehensive toolset for developers building AI-driven media processing pipelines or extracting insights from diverse file formats directly within their development environment.