How is the final audio file formatted?

The skill captures PCM audio (24kHz, 16-bit, mono) and includes scripts to convert this data into a standard base64-encoded WAV file for frontend playback.

Does it support different voices?

Yes, it supports several distinct voice profiles: Alloy (Neutral), Echo (Warm), Fable (Expressive), Onyx (Deep), Nova (Friendly), and Shimmer (Clear).

What environment configuration is required?

You need an Azure OpenAI API key, a specific Realtime endpoint, and a deployment name for the gpt-realtime-mini model.

What API does this Claude Code skill use?

This skill utilizes the Azure OpenAI Realtime API, specifically optimized for the gpt-realtime-mini model to provide low-latency audio generation.

Podcast Generation & Audio Synthesis

Name: Podcast Generation & Audio Synthesis
Author: sickn33

bysickn33

•

31,722

•

콘텐츠 관리

Generates high-quality audio narratives and podcasts from text content using Azure OpenAI's Realtime API.

The Podcast Generation skill empowers Claude to transform written text into natural-sounding audio narratives by leveraging Azure OpenAI's Realtime API. It manages the technical overhead of WebSocket connections, handles streaming PCM audio chunks, and provides the necessary logic to convert raw audio data into standardized WAV formats. This skill is particularly useful for developers and content creators who want to automate the production of podcasts, voiceovers, or accessibility-focused audio versions of documentation directly within their development workflow.

주요 기능

01Synchronized transcript generation during audio output

02Automated PCM to WAV audio format conversion

03Real-time audio streaming via WebSocket integration

0431,722 GitHub stars

05Support for multiple voice profiles including Alloy, Echo, and Shimmer

06Customizable narrator instructions for tone and style control

사용 사례

01Converting technical documentation or blog posts into automated podcasts

02Generating realistic voiceovers for product demonstrations and tutorials

03Creating interactive voice-enabled AI agents and narrative experiences

주요 기능

01Synchronized transcript generation during audio output

02Automated PCM to WAV audio format conversion

03Real-time audio streaming via WebSocket integration

0431,722 GitHub stars

05Support for multiple voice profiles including Alloy, Echo, and Shimmer

06Customizable narrator instructions for tone and style control

사용 사례

01Converting technical documentation or blog posts into automated podcasts

02Generating realistic voiceovers for product demonstrations and tutorials

03Creating interactive voice-enabled AI agents and narrative experiences