소개
This skill provides a comprehensive framework for creating independent, production-ready speech synthesis systems without relying on commercial API providers. It facilitates the setup of a FastAPI-based server using the MeloTTS model, supporting six languages and multiple speaker profiles with optimized streaming for minimal latency. Designed specifically for the LiveKit ecosystem, it includes a drop-in plugin replacement for standard TTS interfaces, complete with extensive testing suites and deployment documentation to ensure robust performance in private or air-gapped environments.