Fish Audio
Integrates Fish Audio's Text-to-Speech API with large language models, enabling natural language-driven speech synthesis through the Model Context Protocol.
About
The Fish Audio server acts as an MCP (Model Context Protocol) intermediary, seamlessly connecting Fish Audio's advanced Text-to-Speech (TTS) capabilities with LLMs like Claude. It empowers large language models to generate high-quality, natural-sounding speech from text, supporting features such as real-time audio streaming, multiple voice models via reference IDs, and a variety of audio formats. This robust tool allows developers and AI agents to easily leverage powerful TTS for diverse applications, from interactive conversational agents to dynamic content generation, all configurable via environment variables.
Key Features
- 2 GitHub stars
- Support for custom voice models via reference IDs
- High-Quality TTS leveraging Fish Audio's models
- Multiple audio formats including MP3, WAV, PCM, and Opus
- Real-time audio streaming for low-latency applications
- Flexible configuration via environment variables
Use Cases
- Synthesizing speech using specific custom voice models
- Performing real-time, low-latency audio streaming for interactive applications
- Generating speech from text for AI conversational agents