What happens if I run out of API credits?

If the API returns an HTTP 402 error indicating insufficient credits, the skill will stop immediately and notify you that your credits are exhausted.

What are the file size limits for media uploads?

Images are limited to 20MB per file, while video and audio files have a maximum size limit of 100MB.

Which AI model does the media-understand skill use?

By default, it uses the google/gemini-2.5-pro model via OpenRouter, but you can override this by using the --model parameter in your command.

Can I use this skill to analyze YouTube videos without downloading them?

Yes, you can provide a direct YouTube URL to the --media parameter, and the skill will process the video content for summarization or analysis.

What media formats are supported by this skill?

The skill supports common image formats (JPG, PNG, GIF, WebP), video formats (MP4, MOV, WEBM, YouTube URLs), and audio formats (MP3, WAV, AAC, M4A, FLAC).

Media Understanding & Analysis

Name: Media Understanding & Analysis
Author: maxgent-ai

bymaxgent-ai

0•

Data Science & ML

Analyzes and extracts insights from images, videos, and audio files using advanced AI models.

Media Understanding is a specialized Claude Code skill that enables deep AI-powered analysis of multimedia content including images, videos, and audio files. Leveraging the Maxgent FAL API proxy and high-performance models like Gemini 2.5 Pro, it allows users to perform complex OCR, summarize video content (including direct YouTube URLs), and transcribe or analyze audio recordings. This skill bridges the gap between raw media files and actionable text-based data, providing a unified interface for multimedia intelligence within your development workflow.

Key Features

01Customizable analysis with adjustable model IDs, temperature, and token limits.

02Comprehensive multi-format support for images, videos, and audio files.

030 GitHub stars

04Advanced OCR capabilities for extracting text from screenshots, diagrams, and documents.

05Direct YouTube URL processing for instant video summarization and analysis.

06Audio intelligence for transcribing and summarizing meeting logs or voice notes.

Use Cases

01Automating text extraction from software screenshots for technical documentation.

02Analyzing audio meeting recordings to automatically generate summaries and action items.

03Summarizing long YouTube tutorials or webinars into concise, actionable bullet points.

Key Features

01Customizable analysis with adjustable model IDs, temperature, and token limits.

02Comprehensive multi-format support for images, videos, and audio files.

030 GitHub stars

04Advanced OCR capabilities for extracting text from screenshots, diagrams, and documents.

05Direct YouTube URL processing for instant video summarization and analysis.

06Audio intelligence for transcribing and summarizing meeting logs or voice notes.

Use Cases

01Automating text extraction from software screenshots for technical documentation.

02Analyzing audio meeting recordings to automatically generate summaries and action items.

03Summarizing long YouTube tutorials or webinars into concise, actionable bullet points.