Which Whisper model should I use for transcription?

For general use, the 'small' model offers the best balance of speed and accuracy. Use 'tiny' for quick previews and 'large-v3' for final production-grade delivery where accuracy is paramount.

Does this skill support speaker identification?

Yes, it includes implementation patterns for speaker diarization using pyannote.audio, allowing you to distinguish between different participants in a recording.

Is GPU acceleration supported?

Yes, the skill includes instructions for using Insanely Fast Whisper and CUDA-enabled devices to significantly speed up the transcription of long-form content.

What is the best way to prepare video files for transcription?

The skill recommends using FFmpeg to extract audio as a 16kHz mono WAV file (pcm_s16le), which is the optimal format for Whisper processing.

Can I export transcripts for use in video editors like Final Cut Pro?

Yes, the skill provides scripts to convert Whisper JSON output into frame-accurate timing data compatible with professional non-linear editors (NLEs).

Video Transcription & Subtitling

Name: Video Transcription & Subtitling
Author: MadAppGang

byMadAppGang

•

Data Science & ML

Transcribes audio and video files using OpenAI Whisper with support for multiple formats, speaker diarization, and timing synchronization.

This skill empowers Claude to handle professional audio and video transcription workflows using OpenAI Whisper and its high-performance variants like whisper.cpp and Insanely Fast Whisper. It provides production-ready patterns for generating subtitles in SRT and VTT formats, extracting word-level timing in JSON, and performing speaker diarization for multi-speaker content. Whether you are preparing content for Final Cut Pro or automating batch transcription for a large media library, this skill offers optimized FFmpeg pre-processing and model selection guidance to balance processing speed with transcript accuracy.

Key Features

013 GitHub stars

02Automated audio extraction and pre-processing using FFmpeg for 16kHz mono optimization

03Multi-format export including SRT, WebVTT, and JSON with word-level timestamps

04Support for OpenAI Whisper, whisper.cpp, and GPU-accelerated transcription engines

05Frame-accurate timing synchronization for professional NLE video editing workflows

06Advanced speaker diarization to identify and label different speakers in a recording

Use Cases

01Automating the transcription of meeting recordings, interviews, and lectures for documentation

02Batch processing media libraries to extract searchable text and metadata

03Generating high-accuracy subtitles and closed captions for professional video production

Key Features

013 GitHub stars

02Automated audio extraction and pre-processing using FFmpeg for 16kHz mono optimization

03Multi-format export including SRT, WebVTT, and JSON with word-level timestamps

04Support for OpenAI Whisper, whisper.cpp, and GPU-accelerated transcription engines

05Frame-accurate timing synchronization for professional NLE video editing workflows

06Advanced speaker diarization to identify and label different speakers in a recording

Use Cases

01Automating the transcription of meeting recordings, interviews, and lectures for documentation

02Batch processing media libraries to extract searchable text and metadata

03Generating high-accuracy subtitles and closed captions for professional video production