What hardware is required for this skill?

The skill is optimized for NVIDIA GPUs with at least 6GB of VRAM (such as the RTX 3090) to run the whisper.cpp large-v3 model efficiently.

Can I use this skill in Python scripts?

Yes, the skill includes a Python reference implementation using the requests library to send audio files to the local inference endpoint on port 5555.

Is my audio data sent to the cloud?

No, all transcription is performed locally on your machine using a whisper.cpp server, ensuring complete data privacy and security.

How do I handle Out-of-Memory (OOM) errors?

If your GPU hits VRAM limits, the skill suggests waiting 30-60 seconds for other services to unload or checking GPU usage with nvidia-smi.

Which audio formats are supported?

The server accepts common formats like WAV and MP3, though 16kHz mono WAV is the optimal format for the whisper-server.

Whisper Audio Transcription

Name: Whisper Audio Transcription
Author: lawless-m

bylawless-m

0•

データサイエンスとML

Transcribes audio files locally using whisper.cpp with CUDA acceleration for high-performance speech-to-text conversion.

Whisper Audio Transcription is a specialized skill that integrates a local whisper.cpp server into the Claude environment, enabling high-accuracy speech-to-text capabilities. It leverages GPU acceleration (CUDA) with the large-v3 model to provide fast, private transcription services via a simple HTTP API. This skill is ideal for developers needing to automate transcription tasks, record and convert voice notes, or integrate robust audio processing into Python and Shell workflows without relying on external cloud APIs or incurring subscription costs.

主な機能

01Comprehensive Python and Shell script integration examples

020 GitHub stars

03Local GPU-accelerated transcription using CUDA

04High-accuracy speech-to-text with the large-v3 model

05Simple REST API integration via HTTP POST /inference

06Supports common audio formats including WAV and MP3

ユースケース

01Integrating speech-to-text into local application backends

02Automating meeting or voice note transcriptions

03Private, offline processing of sensitive audio data

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add lawless-m/gwen Whisper-Transcription

For use in Claude.ai and ChatGPT

Download Skill