关于
This skill integrates Google Gemini's advanced multimodal capabilities directly into Claude, enabling deep analysis of audio files up to 9.5 hours, video processing up to 6 hours, and complex data extraction from multi-page PDFs. It provides a unified interface for diverse tasks such as timestamped transcription, object detection, visual Q&A, and high-fidelity text-to-image generation. Whether you are automating data entry from scanned forms or building automated video summarization pipelines, this skill provides the necessary patterns and scripts to handle sophisticated media processing workflows.