Does it support industry-standard subtitle formats?

Yes, it supports SRT (SubRip), VTT (WebVTT), and JSON with word-level timing, making it compatible with most video players and editing software.

Can this skill handle multi-speaker conversations?

Yes, it includes implementation patterns for speaker diarization using pyannote.audio to identify and label different participants in an audio file.

Which Whisper model should I use for general transcription?

The 'small' model offers a great balance of speed and accuracy for general use, while 'large-v3' is recommended for final production quality where accuracy is paramount.

How do I prepare video files for transcription?

The skill provides optimized FFmpeg commands to extract audio at 16kHz mono, which is the gold standard for Whisper model accuracy and performance.

Is GPU acceleration supported for faster processing?

Yes, the skill covers Insanely Fast Whisper and CUDA-based execution to significantly speed up transcription on systems with compatible NVIDIA hardware.

AI Transcription & Subtitles

Name: AI Transcription & Subtitles
Author: MadAppGang

byMadAppGang

•

211

•

Ciencia de Datos y ML

Transcribes audio and video files using OpenAI Whisper models for production-grade subtitles and timing data.

This skill provides expert guidance and code patterns for performing high-quality audio and video transcription using the OpenAI Whisper ecosystem. It covers multiple implementation paths including Python, C++, and GPU-accelerated versions, while providing specific patterns for model selection, frame-accurate timing synchronization, and subtitle generation in industry-standard formats like SRT and VTT. Whether you need batch processing for large media libraries or precise speaker diarization for interviews, this skill streamlines the integration of advanced speech-to-text capabilities into your video editing and content creation workflows.

Características Principales

01Support for SRT, VTT, and JSON timing formats with word-level precision

02Model selection optimization based on VRAM and accuracy requirements

03211 GitHub stars

04Advanced patterns for speaker diarization and NLE timing synchronization

05Multi-engine Whisper support (Python, whisper.cpp, and Insanely Fast Whisper)

06Automated audio extraction and preprocessing with FFmpeg integration

Casos de Uso

01Synchronizing AI-generated text with video frames for professional editing software like Final Cut Pro

02Extracting searchable transcripts from large archives of recorded meetings or interviews

03Generating multi-language subtitles and captions for professional video production

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add madappgang/claude-code transcription

For use in Claude.ai and ChatGPT

Download Skill