What is the benefit of word-level alignment?

Word-level alignment provides precise start and end times for every single word spoken, which is essential for high-quality captioning and advanced data analysis.

What file formats are supported for transcription?

The skill supports common audio formats like MP3, WAV, FLAC, M4A, and OGG, as well as video formats like MP4, MKV, MOV, and AVI from which audio is automatically extracted.

Do I need to manually install the Whisper model?

No. The skill uses Python 3.12 and the 'uv' package manager to automatically download and manage the necessary WhisperX models during the first run.

Can this skill generate subtitles for videos?

Yes, it can output transcriptions in SRT and VTT formats, which are industry standards for video subtitles.

Does it support languages other than English?

Yes, it features multi-language support including Chinese, Japanese, and many others, with an auto-detect option to identify the language automatically.

Audio Transcriber

Name: Audio Transcriber
Author: maxgent-ai

bymaxgent-ai

0•

数据科学与机器学习

Converts audio and video files into accurate text transcripts with precise word-level timestamps using WhisperX.

The Audio Transcriber skill enables Claude to perform high-quality speech-to-text conversion directly within your development environment. By leveraging WhisperX, it provides multi-language support and superior word-level alignment compared to standard Whisper implementations. Whether you need to generate SRT subtitles for a video, transcribe a recorded meeting, or extract structured JSON data from speech, this skill automates the entire process including audio extraction from video files and VAD filtering to ensure clean results.

主要功能

01Support for various formats including MP3, WAV, MP4, and MKV

02High-precision word-level timestamp alignment

03Multiple model sizes ranging from 'tiny' for speed to 'large-v2' for accuracy

04Multi-language support with automatic speech detection

05Export options for TXT, SRT, VTT, and structured JSON

060 GitHub stars

使用场景

01Transcribing technical meetings or interviews into searchable documentation

02Extracting dialogue from media files for programmatic data analysis

03Generating professional subtitle files (SRT/VTT) for video content

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add maxgent-ai/maxgent-plugin audio-transcribe

For use in Claude.ai and ChatGPT

主要功能

01Support for various formats including MP3, WAV, MP4, and MKV

02High-precision word-level timestamp alignment

03Multiple model sizes ranging from 'tiny' for speed to 'large-v2' for accuracy

04Multi-language support with automatic speech detection

05Export options for TXT, SRT, VTT, and structured JSON

060 GitHub stars

使用场景

01Transcribing technical meetings or interviews into searchable documentation

02Extracting dialogue from media files for programmatic data analysis

03Generating professional subtitle files (SRT/VTT) for video content

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add maxgent-ai/maxgent-plugin audio-transcribe

For use in Claude.ai and ChatGPT