How long of an audio file can this skill process?

The Gemini Audio skill can process up to 9.5 hours of audio content per request, making it ideal for long-form podcasts or meetings.

Can I use this with Google Vertex AI?

Yes, the skill is compatible with both Google AI Studio and Vertex AI. You can switch by setting the GEMINI_USE_VERTEX environment variable.

How is the audio processing priced?

Pricing is token-based; for example, Gemini 2.5 Flash costs approximately $1.00 per 1M input tokens, with 1 minute of audio equaling roughly 1,920 tokens.

Does this skill support speaker identification?

Yes, the analysis features include the ability to identify different speakers and extract structured dialogue from recordings.

What audio formats are supported by the Gemini Audio skill?

It supports a wide variety of common formats including WAV, MP3, AAC, FLAC, OGG Vorbis, and AIFF.

Gemini Audio Integration

Name: Gemini Audio Integration
Author: kienhaminh

bykienhaminh

0•

데이터 과학 및 ML

Implements comprehensive audio processing and text-to-speech generation using the Google Gemini API.

The Gemini Audio skill empowers Claude with advanced audio capabilities, enabling it to transcribe, summarize, and analyze audio files up to 9.5 hours in length. It supports a wide range of formats and can distinguish between speech, music, and ambient sounds, while also providing high-quality text-to-speech generation with controllable voice styles. This tool is essential for developers building applications that require deep audio understanding, meeting transcription, or natural-sounding voice responses via Google AI Studio or Vertex AI.

주요 기능

01Seamless integration with both Google AI Studio and Vertex AI endpoints

02High-fidelity transcription with timestamps and multi-speaker identification

03Support for multiple formats including MP3, WAV, FLAC, and AAC

040 GitHub stars

05Long-form audio analysis supporting up to 9.5 hours per request

06Controllable Text-to-Speech (TTS) with adjustable style, pace, and tone

사용 사례

01Automating meeting and podcast transcriptions with speaker detection

02Generating natural-sounding narration or voiceovers for applications

03Extracting insights and summaries from long-form audio recordings

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add kienhaminh/speed-reader gemini-audio

For use in Claude.ai and ChatGPT

Download Skill