A Python toolkit designed to process audio files, converting them into detailed, word-level, forced-aligned, and speaker-labeled CSV transcripts. Built upon WhisperX, it goes beyond one-shot transcriptions, offering modules to format these transcripts into readable scripts, intelligently chunk audio into segments based on YAML configurations, and perform fuzzy searches on transcript content. It is ideal for iterative workflows, allowing users to extract, analyze, and refine audio data efficiently.
Key Features
01Transcribe audio with speaker diarization (WhisperX-based)
02Format CSV transcripts into human-readable scripts
03Split audio into custom segments via YAML configuration
04Integrates as an MCP server for Claude Code and other MCP clients
05Perform fuzzy searches on transcript content by word or phrase
060 GitHub stars