소개
The Voice Agent Architect skill provides specialized guidance for building production-grade voice interfaces that feel natural and responsive. It bridges the gap between complex audio processing and LLM logic, offering implementation patterns for both modern Speech-to-Speech (S2S) models and traditional STT-LLM-TTS pipelines. By focusing on the 'physics of latency,' this skill helps developers implement critical features like barge-in detection, semantic Voice Activity Detection (VAD), and turn-taking logic while avoiding common anti-patterns that lead to awkward or robotic user experiences.