Transcribe video and audio content using multiple automatic speech recognition (ASR) providers, including local Whisper models and online services like JianYing (CapCut) and Bcut (Bilibili).
Sponsored
Video Extraction Plus extends a core server for video and audio transcription by integrating diverse automatic speech recognition (ASR) capabilities. It offers flexible options, allowing users to choose between local transcription with OpenAI's Whisper model or leverage the powerful online ASR services from ByteDance's JianYing (CapCut) and Bilibili's Bcut. Designed for extensibility, the tool provides a standardized architecture for easy integration of new ASR providers, robust caching, rate limiting, and comprehensive status management, making it a versatile solution for automating transcription workflows.
Key Features
01Built-in rate limiting for online ASR services to prevent API overload
02Configurable ASR provider via YAML file or environment variables
03Caching mechanism for ASR results to improve efficiency
04Extendable architecture for integrating new ASR providers