Synthesizes speech from text using the VOICEVOX engine within an MCP server environment.
Integrates ElevenLabs text-to-speech API to generate audio from text, manage voices, and track generation history.
Enables AI agents to compose, mix, and master music tracks within the REAPER digital audio workstation.
Enables AI assistants to initiate and manage voice calls using Twilio and OpenAI.
Enables text-to-speech capabilities using the Rime API, playing audio through the system's native audio player.
Enables interaction with ElevenLabs' Text to Speech and audio processing APIs through the Model Context Protocol.
Extracts watermark-free video links, video captions, and audio transcriptions from Douyin (TikTok) share links.
Provides speech processing services, including audio validation, speech transcription, and voice activity detection, using Alibaba's FunASR library.
Provides text-to-speech generation with automatic audio playback using the Chatterbox TTS model.
Provides headless, zero-runtime video and audio editing capabilities using FFmpeg and MCP.
Experience AI Xiaozhi's voice and smart assistant functionalities through a versatile Python-based client, enabling access without dedicated hardware.
Provides an MCP server to expose REAPER Digital Audio Workstation functionality via a clean API.
Provides a comprehensive system for training and inferring singing voice models, complete with development and testing environments.
Empower AI agents and desktop clients to generate music through natural language commands using an advanced AI music platform.
Transcribe video and audio content using multiple automatic speech recognition (ASR) providers, including local Whisper models and online services like JianYing (CapCut) and Bcut (Bilibili).
Generate infinite-length, high-quality talking head videos from a single image and audio input.
Integrates OpenAI's Text-to-Speech API into Claude Code, providing developers with audio feedback directly within their coding environment.
Synthesize realistic audio using the Qwen3-TTS 1.7B model with advanced voice design and cloning capabilities.
Provides comprehensive control over ProPresenter presentations through its API, exposing a wide range of functionalities as a Model Context Protocol (MCP) server.
Generate natural-sounding audio from text with multi-voice synthesis, emotional speech, and real-time streaming capabilities.
Generates natural-sounding audio from text for AI assistants and developers, offering multi-voice synthesis, real-time streaming, and SSML support.