Does this skill support different voice options?

Yes, it supports several voice profiles including Alloy (Neutral), Echo (Warm), Fable (Expressive), Onyx (Deep), Nova (Friendly), and Shimmer (Clear).

What model does this skill use for audio generation?

It primarily utilizes Azure OpenAI's GPT Realtime Mini model, which is optimized for low-latency, high-quality real-time audio output.

Can I use this for real-time conversation?

While the skill focuses on podcast-style narration, the underlying WebSocket architecture supports full-duplex real-time interaction.

How is the raw audio output handled?

The API outputs PCM audio chunks (24kHz, 16-bit, mono). This skill provides logic to collect these chunks and convert them into standard WAV format for storage and playback.

Is a specific API key required?

Yes, you need an Azure OpenAI API key and a deployment of the gpt-realtime-mini model to use this skill effectively.

Podcast Generation & Realtime Audio

Name: Podcast Generation & Realtime Audio
Author: JantonioFC

byJantonioFC

•

データサイエンスとML

Generates high-quality AI-powered podcast narratives and real-time audio using Azure OpenAI's GPT Realtime Mini model via WebSockets.

This skill enables developers to integrate advanced text-to-speech and audio narrative generation capabilities into their applications using the Azure OpenAI Realtime API. It provides a complete end-to-end workflow for connecting via WebSockets, handling streaming audio deltas (PCM), converting raw audio to standard WAV formats, and implementing frontend playback. Ideal for building automated podcast creators, interactive voice assistants, or content-to-audio features, this skill offers the implementation patterns and environment configurations needed for production-grade audio output.

主な機能

01Support for multiple expressive voices including Alloy, Echo, and Fable

022 GitHub stars

03PCM to WAV conversion utilities for standard browser audio playback

04Asynchronous handling of simultaneous audio and transcript deltas

05Full-stack implementation patterns for Python backends and JS frontends

06Real-time audio streaming via Azure OpenAI WebSocket integration

ユースケース

01Low-latency voice narration for interactive AI applications

02Automated podcast generation from blog posts or text articles

03Developing accessibility-focused text-to-speech features

主な機能

01Support for multiple expressive voices including Alloy, Echo, and Fable

022 GitHub stars

03PCM to WAV conversion utilities for standard browser audio playback

04Asynchronous handling of simultaneous audio and transcript deltas

05Full-stack implementation patterns for Python backends and JS frontends

06Real-time audio streaming via Azure OpenAI WebSocket integration

ユースケース

01Low-latency voice narration for interactive AI applications

02Automated podcast generation from blog posts or text articles

03Developing accessibility-focused text-to-speech features