소개
The Vision & Multimodal skill bridges the gap between text and visual data, allowing Claude to interpret images, screenshots, and complex documents with high precision. It provides standardized implementation patterns for encoding visual media, performing OCR-like text extraction, and analyzing technical charts or diagrams. Whether you are automating document workflows, auditing UI layouts, or extracting data from receipts, this skill optimizes visual processing while offering strategies to minimize token consumption through efficient image resizing.