Transcribes audio and video files using OpenAI Whisper with support for multiple formats, speaker diarization, and timing synchronization.
This skill empowers Claude to handle professional audio and video transcription workflows using OpenAI Whisper and its high-performance variants like whisper.cpp and Insanely Fast Whisper. It provides production-ready patterns for generating subtitles in SRT and VTT formats, extracting word-level timing in JSON, and performing speaker diarization for multi-speaker content. Whether you are preparing content for Final Cut Pro or automating batch transcription for a large media library, this skill offers optimized FFmpeg pre-processing and model selection guidance to balance processing speed with transcript accuracy.
Key Features
013 GitHub stars
02Automated audio extraction and pre-processing using FFmpeg for 16kHz mono optimization
03Multi-format export including SRT, WebVTT, and JSON with word-level timestamps
04Support for OpenAI Whisper, whisper.cpp, and GPU-accelerated transcription engines
05Frame-accurate timing synchronization for professional NLE video editing workflows
06Advanced speaker diarization to identify and label different speakers in a recording
Use Cases
01Automating the transcription of meeting recordings, interviews, and lectures for documentation
02Batch processing media libraries to extract searchable text and metadata
03Generating high-accuracy subtitles and closed captions for professional video production