What audio formats can I transcribe?

The skill supports most common audio formats including MP3, M4A, WAV, OGG, FLAC, and WebM.

What are the system requirements?

You will need Python 3.10 or higher and the OpenAI Whisper CLI installed (e.g., via 'brew install openai-whisper').

Does this skill require an OpenAI API key?

No, this skill uses the OpenAI Whisper CLI which runs locally on your hardware, ensuring your data stays private and the service remains free.

How accurate is the transcription?

By default, the skill uses the 'medium' Whisper model, which provides an excellent balance between processing speed and transcription accuracy.

Can I translate non-English audio to English?

Yes, by using the --translate flag, the skill can automatically translate recognized speech from over 100 languages into English.

Voice Recognition (Whisper)

Name: Voice Recognition (Whisper)
Author: dvcrn

bydvcrn

•

データサイエンスとML

Transcribes and translates audio files locally using OpenAI Whisper CLI with support for over 100 languages.

This skill integrates powerful local speech-to-text capabilities into the Claude environment, leveraging the OpenAI Whisper CLI to process audio without external API dependencies. It allows users to transcribe, translate, and summarize audio recordings in over 100 languages—including Chinese and English—directly from their local machine. By running processing locally, it ensures total data privacy and cost-free operation, making it an essential utility for developers and researchers who need fast, reliable transcription and automated summarization of meetings, interviews, or voice notes.

主な機能

01Local speech-to-text processing for maximum privacy and zero API costs

02Smart summarization feature to extract key points from long recordings

035 GitHub stars

04Support for 100+ languages including Chinese, English, Japanese, and Korean

05Automatic translation of foreign language audio into English text

06Wide format support including MP3, M4A, WAV, OGG, FLAC, and WebM

ユースケース

01Transcribing and summarizing meeting recordings or interviews locally

02Translating foreign language audio content into English for documentation

03Automating bulk audio-to-text conversion workflows via the command line

主な機能

01Local speech-to-text processing for maximum privacy and zero API costs

02Smart summarization feature to extract key points from long recordings

035 GitHub stars

04Support for 100+ languages including Chinese, English, Japanese, and Korean

05Automatic translation of foreign language audio into English text

06Wide format support including MP3, M4A, WAV, OGG, FLAC, and WebM

ユースケース

01Transcribing and summarizing meeting recordings or interviews locally

02Translating foreign language audio content into English for documentation

03Automating bulk audio-to-text conversion workflows via the command line