How is the cost of video generation calculated?

Billing is based on the duration of the audio file, priced at $0.16 per second with a minimum charge of 3 seconds per task.

What video resolutions are supported?

The skill supports 1080p for videos up to 30 seconds and 720p for videos up to 60 seconds.

Can I speed up the video generation process?

Yes, by enabling 'turbo_mode' in the parameters, you can generate videos more quickly, though the visual quality may be slightly lower than the standard mode.

What is the OmniHuman v1.5 model?

OmniHuman v1.5 is a model developed by ByteDance that generates high-quality videos from a single portrait image and an audio file, ensuring the character's lip movements and expressions match the sound.

Does it support different audio formats?

Yes, it supports a wide range of audio formats including MP3, WAV, M4A, OGG, and AAC.

OmniHuman Video Sync

Name: OmniHuman Video Sync
Author: dvcrn

bydvcrn

•

数据科学与机器学习

Generates realistic audio-driven lip-sync videos from static images using the OmniHuman v1.5 model.

This skill empowers Claude to create lifelike talking head videos by synchronizing character images with audio tracks. Utilizing ByteDance's OmniHuman v1.5 model via fal-ai, it produces high-quality video content where facial expressions and mouth movements naturally reflect the emotional tone of the input audio. It is an ideal tool for content creators, marketers, and developers looking to automate the production of digital avatars, educational content, or social media videos directly within their Claude workflow.

主要功能

01High-fidelity lip-syncing powered by OmniHuman v1.5

02Emotionally expressive facial animations driven by audio input

035 GitHub stars

04Turbo mode for accelerated video generation processing

05Support for multiple resolutions including 720p and 1080p

06Seamless integration with common audio formats like MP3 and WAV

使用场景

01Automating the production of dubbed video content for global audiences

02Creating realistic talking avatars for social media and marketing campaigns

03Generating educational videos featuring animated historical figures or characters

主要功能

01High-fidelity lip-syncing powered by OmniHuman v1.5

02Emotionally expressive facial animations driven by audio input

035 GitHub stars

04Turbo mode for accelerated video generation processing

05Support for multiple resolutions including 720p and 1080p

06Seamless integration with common audio formats like MP3 and WAV

使用场景

01Automating the production of dubbed video content for global audiences

02Creating realistic talking avatars for social media and marketing campaigns

03Generating educational videos featuring animated historical figures or characters