Do I need a GPU to use this skill effectively?

While Whisper can run on a CPU, using a GPU (CUDA) is highly recommended as it typically provides a 10-20x increase in transcription speed.

Can Whisper translate audio to languages other than English?

Currently, the Whisper model is specifically optimized to translate non-English audio into English text. For other language pairs, transcription is the primary function.

Which languages does the Whisper skill support?

Whisper supports 99 languages for transcription, including English, Spanish, French, German, Japanese, Chinese, and many more.

What is the difference between the model sizes like 'tiny' and 'large'?

Smaller models like 'tiny' and 'base' are much faster and use less VRAM (1GB), while 'large' and 'turbo' models provide higher accuracy at the cost of more memory and slower processing.

Whisper Speech Recognition

Name: Whisper Speech Recognition
Author: zechenzhangAGI

byzechenzhangAGI

•

384

•

数据科学与机器学习

Transcribes audio, translates speech to English, and automates multilingual audio processing using OpenAI's Whisper models.

Whisper is a robust speech-to-text skill that integrates OpenAI's general-purpose speech recognition model directly into your AI development workflow. It supports 99 languages and provides six different model sizes to balance speed and accuracy, making it ideal for automating podcast transcriptions, generating meeting notes, and translating foreign language audio into English. With features like word-level timestamps and noisy audio handling, it serves as a powerful tool for developers building multimodal applications or processing large-scale audio datasets.

主要功能

01High-accuracy transcription across 99 different languages

02Word-level timestamp generation for precise subtitle alignment

03Multiple model scales from 39M (Tiny) to 1550M (Large) parameters

04GPU acceleration support for up to 20x faster processing speeds

05Seamless translation of non-English audio directly into English text

06384 GitHub stars

使用场景

01Building automated workflows for technical documentation from voice memos

02Automating subtitles and transcriptions for video and podcast content

03Developing AI assistants that summarize multilingual meeting recordings

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add zechenzhangagi/ai-research-skills whisper

For use in Claude.ai and ChatGPT

Download Skill