Generates MP3 files from text using the Kokoro-TTS model and optionally uploads them to S3.
Enables interaction with FFmpeg for common media operations via a stdio MCP server.
Enables video, image, and audio generation through RunwayML and Luma AI APIs using text and image prompts.
Enables AI assistants to control Ableton Live in real-time through a standardized protocol interface.
Provides a Model Control Protocol (MCP) server implementation for real-time speech-to-text transcription using the ElevenLabs Scribe API.
Upload files to Qiniu Cloud Storage for easy referencing of audio and image content.
Analyze images, audio, and videos using Google's Gemini AI.
Provides local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.
Provides an enhanced Model Context Protocol (MCP) server for interacting with ElevenLabs' text-to-speech and audio processing APIs, specifically designed for conversational AI agents.
Transforms Bilibili video content into structured notes, provides intelligent Q&A, and transcribes audio.
Provides a Model Context Protocol server integrating Google AI Studio and Gemini API for multi-modal content generation, file processing, PDF-to-Markdown conversion, image analysis, and audio transcription.
Enables AI models like Claude to directly control Strudel.cc for AI-assisted music generation and live coding.
Transforms coding agents into voice-enabled companions by providing real-time audio notifications and interactions.
Converts text to speech using the Minimax AI API and automatically uploads generated audio files to Amazon S3.
Integrates video, audio, and image processing with advanced AI and MCP protocol support for intelligent, natural language-driven media content creation.
Provides an MCP server to access Deepgram's advanced speech recognition and text-to-speech functionalities.
Transcribe ScreenPal videos using local AI models, generating comprehensive audio transcripts and visual descriptions without cloud dependencies.
Provides programmatic access to Apple Voice Memos on macOS, enabling AI assistants to interact with voice recordings.
Provides a production-ready multi-voice text-to-speech library and MCP server for AI assistants, enabling real-time streaming, SSML, emotional speech, and sound effects.
Provides an AI-powered text-to-speech library for generating expressive, human-like voices for AI agents, featuring multi-voice synthesis, real-time streaming, SSML support, and sound effects.
Enables open-source web scraping and multimodal data extraction for AI model context and data gathering.