Implements high-accuracy speech-to-text transcription, translation, and speaker diarization using OpenAI's audio models.
This skill provides Claude Code with standardized patterns and best practices for integrating OpenAI's audio APIs into applications. It enables developers to implement robust transcription and translation features, handling complex tasks like speaker diarization, word-level timestamps, and subtitle generation (SRT/VTT). The skill also includes critical logic for managing large audio files through automated chunking and context preservation, ensuring high-quality, reliable outputs across various audio formats and file sizes.
Key Features
01Advanced model selection for accuracy vs. cost optimization
02Support for SRT and VTT subtitle format generation
03Multi-speaker diarization and identification patterns
04Word-level and segment-level timestamping
055 GitHub stars
06Automated audio chunking for files exceeding 25MB
Use Cases
01Translating non-English audio recordings directly into English text
02Creating accessibility-compliant subtitles for video content and editing
03Generating searchable transcripts for meetings, interviews, and podcasts