01Multimodal perception with video frames and audio transcriptions
02Flexible backend options for audio processing (Gemini, local Whisper, OpenAI)
03Adaptive video extraction parameters (fps, time range, resolution)
04Automatic Whisper model download and installation
05Interactive setup wizard for easy configuration
0633 GitHub stars