What hardware is required for this skill?

This skill is specifically optimized for Apple Silicon Macs (M1, M2, M3, or M4) to leverage the MLX framework for high-speed local audio generation.

How does the lip-sync synchronization work?

It calculates the RMS (root mean square) of the audio waveform for every specific video frame, accounting for playback rates to ensure the mouth movement perfectly matches the sound.

Why is Whisper used in the verification phase?

Whisper transcribes the generated AI voice to compare it against the original text. If the similarity is low, it alerts the user that the audio may be corrupted or truncated.

Can I use custom character voices?

Yes, characters and their specific voice instructions can be dynamically defined in the characters.yaml file to customize the personality and tone of the TTS output.

What is the 'double-adjustment' trap mentioned in the guide?

It occurs when playback rate adjustments are applied twice—once during frame calculation in the script and again in the Remotion UI—leading to massive desync or audio being cut off.

Remotion Qwen-TTS Video Generator

Name: Remotion Qwen-TTS Video Generator
Author: kazuph

bykazuph

•

Gestión de Contenido

Automates high-quality AI video production using Remotion and Qwen-TTS with a rigorous multi-stage verification workflow.

This skill provides a comprehensive framework for generating professional AI-voiced videos using Remotion and Qwen3-TTS, specifically optimized for Apple Silicon via MLX. It streamlines the entire production pipeline—from script preparation and localized voice generation to precise lip-sync synchronization and frame rate calculations. The skill's core strength lies in its mandatory verification phase, which utilizes Whisper for transcript accuracy and automated audio analysis to detect silences or cutoffs, effectively preventing common sync errors like playback rate double-adjustment.

Características Principales

01Optimized Qwen3-TTS voice generation using MLX for Apple Silicon hardware

02Precise frame calculation logic to prevent audio-visual desync and playback errors

0315 GitHub stars

04Multi-stage verification using Whisper for automated transcript similarity checks

05Automated video-audio analysis for detecting silence gaps and terminal audio cutoffs

06Automated lip-sync data extraction based on real-time audio waveform RMS

Casos de Uso

01Building localized video rendering pipelines on Mac hardware without cloud TTS costs

02Generating AI-narrated social media content or explainer videos with synchronized lip-syncing

03Automating high-volume video production for virtual characters like Zundamon

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add kazuph/dotfiles remotion-qwen-tts

For use in Claude.ai and ChatGPT

Download Skill