Can it translate non-English audio?

Yes, by utilizing the Whisper 'translate' task, the skill can process audio in various languages and output the translation directly in English.

Can I generate subtitles using this skill?

Yes, it provides specific implementation patterns for Whisper's segment-level chunking to generate accurate timestamps for SRT and subtitle files.

Which audio models are supported by this skill?

The skill supports a wide range of models including Whisper (for STT), ElevenLabs, F5-TTS, Kokoro, and XTTS (for TTS and voice cloning).

How do I choose between the different Whisper versions?

The skill includes a comparison guide recommending Whisper Turbo for speed, Whisper Large v3 for maximum accuracy, and standard Whisper for general-purpose high accuracy.

Does it support voice cloning from a reference file?

Yes, the skill includes documentation and code snippets for using F5-TTS and XTTS to clone voices using a reference audio URL and text.

Fal.ai Audio Mastery

Name: Fal.ai Audio Mastery
Author: JosiahSiegel

byJosiahSiegel

•

Data Science & ML

Integrates fal.ai audio models for high-accuracy speech-to-text, premium text-to-speech, and advanced voice cloning.

This skill provides a comprehensive interface for the fal.ai audio ecosystem, enabling developers to implement sophisticated audio processing directly through Claude. It supports industry-leading speech-to-text models like OpenAI's Whisper for transcription and translation, alongside premium text-to-speech engines including ElevenLabs, F5-TTS, and Kokoro. Whether you are generating subtitles with precise timestamps, cloning voices from reference samples, or building multilingual speech pipelines, this skill offers the necessary endpoints, formatting patterns, and parameter guides to streamline your audio engineering workflow.

Key Features

017 GitHub stars

02Advanced voice cloning capabilities using reference audio samples

03Premium TTS integration with ElevenLabs, Kokoro, and XTTS

04Comprehensive STT support via Whisper, Turbo, and Large v3 models

05Precise timestamp generation for subtitle (SRT) and caption creation

06Automated audio-to-English translation for over 99 languages

Use Cases

01Automating high-accuracy transcription and subtitle generation for video content

02Building real-time translation tools that convert foreign speech to English text

03Generating realistic, cloned voiceovers for marketing, gaming, or educational apps

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add josiahsiegel/claude-plugin-marketplace fal-audio

For use in Claude.ai and ChatGPT

Download Skill