关于
This skill provides specialized guidance for building production-grade voice AI experiences, focusing on the critical balance between audio quality and latency budgets. It covers the full spectrum of modern voice technology, from native voice-to-voice models like OpenAI's Realtime API to modular pipelines utilizing Deepgram for transcription and ElevenLabs for synthesis. Developers can leverage these patterns to implement responsive WebRTC handling, robust Voice Activity Detection (VAD), and seamless user barge-in capabilities, ensuring AI interactions feel natural and instantaneous.