What API keys are required to use this skill?

You will need to configure INWORLD_API_KEY, HUME_API_KEY, and GEMINI_API_KEY in your environment variables to use the respective services.

How do I generate a podcast with multiple speakers?

Use the gemini-tts command with the --multi-speaker flag and provide a text script that includes speaker labels like 'Speaker 1:' and 'Speaker 2:'.

Can I control the emotion of the generated speech?

Yes, you can use style descriptions with Hume (e.g., 'whispering', 'excited') or inline audio markups like [happy] or [laughing] with Inworld.

Which TTS providers are supported by this skill?

The skill supports Hume Octave for dynamic emotional voices, Inworld for consistent character voices, and Google Gemini for multi-speaker podcast generation.

Can I play the generated audio directly from the terminal?

Yes, on macOS you can chain the commands with '&& afplay output.mp3' to immediately listen to the generated speech.

Text-to-Speech (TTS) Engine

Name: Text-to-Speech (TTS) Engine
Author: mshuffett

bymshuffett

0•

コンテンツ管理

Generates high-fidelity speech audio and multi-speaker podcasts using Hume, Inworld, and Google Gemini.

This skill integrates industry-leading text-to-speech providers into the Claude Code environment, enabling the generation of high-quality audio files, voiceovers, and podcasts directly from the command line. It features advanced emotion control via Hume Octave, high-consistency character voices with Inworld TTS, and native multi-speaker conversation generation using Google Gemini 2.5. Whether you are creating accessible content, automated podcasts, or expressive character dialogue, this skill provides a unified interface for professional-grade audio synthesis with precise control over tone, speed, and delivery.

主な機能

01High-quality character voices with Inworld TTS and TTS Max models

020 GitHub stars

03Native multi-speaker podcast generation using Google Gemini 2.5

04Advanced emotion control using acting instructions and inline audio markups

05Customizable audio output formats and direct CLI playback integration

06Dynamic voice generation with emotional intelligence via Hume Octave

ユースケース

01Prototyping character dialogue with specific emotional nuances for games

02Creating expressive voiceovers for presentations or accessible documentation

03Generating automated multi-speaker podcasts from written scripts

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add mshuffett/dotfiles text-to-speech

For use in Claude.ai and ChatGPT

Download Skill