Which languages can this TTS skill speak?

The skill supports American English, British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, and Mandarin Chinese.

Does this skill require an internet connection for every use?

No, an internet connection is only required for the initial installation and model download. Once configured, all audio generation occurs locally on your machine.

How do I use this for Japanese or Chinese text?

You simply need to install the additional language dependencies using 'pip install misaki[ja]' for Japanese or 'pip install misaki[zh]' for Chinese.

Can I change the voice and speed of the audio?

Yes, you can choose from 11 different voices (male and female) and adjust the playback speed between 0.5x for slow practice and 2.0x for fast listening.

What are the system requirements for this skill?

You need Python 3.9+, mlx-audio installed, and approximately 2GB of disk space for the model cache. It is optimized for performance on Apple Silicon or modern hardware.

Text-to-Speech Generator

Name: Text-to-Speech Generator
Author: WarrenZhu050413

byWarrenZhu050413

•

Learning & Documentation

Generates and plays high-quality multilingual audio locally using the mlx-audio Kokoro model.

This skill empowers Claude to convert text into spoken audio across nine different languages, making it an ideal companion for language learning, pronunciation verification, and accessibility. By utilizing the mlx-audio framework and the Kokoro model, it generates high-fidelity speech locally on your machine, offering 11 distinct voice profiles with adjustable speed controls. Whether you need to hear the correct pronunciation of a complex term or create audio snippets for educational content, this skill provides a seamless, low-latency workflow directly within your terminal environment.

Key Features

01Supports 9 languages including English, Spanish, Japanese, and Mandarin

02Adjustable playback speed ranging from 0.5x to 2.0x

035 GitHub stars

04Offers 11 high-quality voice profiles with American and British accents

05Seamless integration with macOS afplay for instant playback

06Automated local server management for efficient audio processing

Use Cases

01Verifying the pronunciation of technical terms or foreign language phrases

02Improving accessibility by reading text-heavy documentation or code comments aloud

03Developing audio-based language learning tools and study materials

Key Features

01Supports 9 languages including English, Spanish, Japanese, and Mandarin

02Adjustable playback speed ranging from 0.5x to 2.0x

035 GitHub stars

04Offers 11 high-quality voice profiles with American and British accents

05Seamless integration with macOS afplay for instant playback

06Automated local server management for efficient audio processing

Use Cases

01Verifying the pronunciation of technical terms or foreign language phrases

02Improving accessibility by reading text-heavy documentation or code comments aloud

03Developing audio-based language learning tools and study materials