What audio formats can I transcribe?

The skill supports common formats including WAV, MP3, M4A, FLAC, and OGG, and can also extract audio from video files like MP4.

Can I translate audio from other languages into English?

Absolutely. Using the Whisper engine's translation task, you can transcribe non-English audio directly into English text.

Does it support speaker identification?

Yes, the skill supports speaker diarization via Google Cloud and AssemblyAI, allowing you to distinguish between different people in a conversation.

Can I use this skill for free without API keys?

Yes, by selecting the Whisper engine, you can perform transcriptions locally on your machine for free without requiring external API keys.

Is real-time transcription possible?

Yes, the skill includes scripts for real-time streaming transcription using Whisper, Google, or Azure engines directly from your microphone.

Speech-to-Text Transcription

Name: Speech-to-Text Transcription
Author: astoreyai

byastoreyai

•

Productividad y Flujo de Trabajo

Transcribes audio files and live recordings into text using powerful engines like Whisper, Google Speech, and Azure.

This skill equips Claude Code with comprehensive speech-to-text capabilities, enabling seamless conversion of audio into actionable text data. It supports a wide range of engines, from the privacy-focused, local-first OpenAI Whisper to enterprise-grade cloud services like Google Cloud Speech and Azure. Users can record live audio, process existing files, identify different speakers through diarization, and generate formatted outputs such as SRT subtitles or structured JSON. It is an essential utility for developers and teams looking to automate meeting notes, generate content captions, or integrate voice-to-text workflows directly into their AI-assisted environment.

Características Principales

01Advanced speaker diarization to identify and label multiple participants

02Real-time audio streaming and live recording directly from the microphone

031 GitHub stars

04Local transcription with Whisper for enhanced privacy and zero API costs

05Multi-language support for over 99 languages with built-in translation

06Support for multiple STT engines including Whisper, Google, Azure, and AssemblyAI

Casos de Uso

01Automating the transcription of voice notes into structured documentation or task lists

02Transforming meeting recordings into formatted markdown notes with speaker identification

03Generating accurate SRT or VTT subtitle files for video content and accessibility

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add astoreyai/claude-skills utility

For use in Claude.ai and ChatGPT

Download Skill