What is the best way to order content for vision tasks?

You should always place images or document blocks before text in the content array to improve Claude's reasoning performance.

What is the difference between image and document types?

Use type: 'image' for standard image files (JPEG, PNG, etc.) and type: 'document' specifically for PDFs to enable dual text-extraction and page-rendering.

Does this skill require an external OCR library for PDFs?

No, Claude reads text directly from images and PDFs using its native multimodal capabilities, eliminating the need for separate OCR tools.

How are tokens calculated for images in Claude?

Image tokens are proportional to resolution based on the formula: tokens = (width * height) / 750.

Claude Vision & PDF Analysis

Name: Claude Vision & PDF Analysis
Author: agents-inc

byagents-inc

•

Ciencia de Datos y ML

Enables advanced image understanding and multi-page PDF processing using Claude's multimodal capabilities.

This skill provides standardized patterns and best practices for integrating Claude's vision and document analysis capabilities into your applications. It covers everything from handling various image formats and PDF document blocks to multi-image comparison and structured data extraction using Zod schemas. With built-in token cost estimation and optimization strategies, it ensures efficient multimodal processing while adhering to Anthropic's specific implementation requirements like image-first ordering and mandatory token limits.

Características Principales

01Structured data extraction from visual content using Zod schemas

02Automated token cost estimation and intelligent image resizing logic

035 GitHub stars

04Native PDF processing without the need for external OCR libraries

05Optimized image-first content ordering for improved reasoning accuracy

06Multimodal support for JPEG, PNG, GIF, WebP, and PDF formats

Casos de Uso

01Building visual comparison tools for A/B testing or design reviews

02Summarizing and analyzing complex multi-page PDF documents

03Extracting structured data from screenshots, receipts, and invoices

Características Principales

01Structured data extraction from visual content using Zod schemas

02Automated token cost estimation and intelligent image resizing logic

035 GitHub stars

04Native PDF processing without the need for external OCR libraries

05Optimized image-first content ordering for improved reasoning accuracy

06Multimodal support for JPEG, PNG, GIF, WebP, and PDF formats

Casos de Uso

01Building visual comparison tools for A/B testing or design reviews

02Summarizing and analyzing complex multi-page PDF documents

03Extracting structured data from screenshots, receipts, and invoices