Generates high-quality AI-powered podcast narratives and real-time audio using Azure OpenAI's GPT Realtime Mini model via WebSockets.
This skill enables developers to integrate advanced text-to-speech and audio narrative generation capabilities into their applications using the Azure OpenAI Realtime API. It provides a complete end-to-end workflow for connecting via WebSockets, handling streaming audio deltas (PCM), converting raw audio to standard WAV formats, and implementing frontend playback. Ideal for building automated podcast creators, interactive voice assistants, or content-to-audio features, this skill offers the implementation patterns and environment configurations needed for production-grade audio output.
主な機能
01Support for multiple expressive voices including Alloy, Echo, and Fable
022 GitHub stars
03PCM to WAV conversion utilities for standard browser audio playback
04Asynchronous handling of simultaneous audio and transcript deltas
05Full-stack implementation patterns for Python backends and JS frontends
06Real-time audio streaming via Azure OpenAI WebSocket integration
ユースケース
01Low-latency voice narration for interactive AI applications
02Automated podcast generation from blog posts or text articles
03Developing accessibility-focused text-to-speech features