What file formats does the OCR skill support?

The skill supports a wide range of formats including .png, .jpg, .jpeg, .tiff, .bmp, .webp, and multi-page scanned .pdf files.

Can I process documents in languages other than English?

Yes, the skill supports over 100 languages. You can specify your target languages using the --lang flag, for example: /llamafarm:ocr --lang en,de,fr.

Do I need an active internet connection to use this OCR skill?

The skill communicates with a local LlamaFarm ML runtime. As long as your local runtime is started (using /llamafarm:start), processing happens locally on your machine.

How do I handle low-confidence extraction results?

The skill returns a confidence score for every extraction. If a score is low, you can try re-running the command with a different backend better suited for that specific document type.

Which OCR backend should I choose for my document?

Surya is the best all-rounder for general documents. Use EasyOCR for handwriting or scene text, PaddleOCR for CJK (Chinese, Japanese, Korean) languages, and Tesseract for simple, high-speed legacy document processing.

Optical Character Recognition (OCR)

Name: Optical Character Recognition (OCR)
Author: llama-farm

byllama-farm

0•

データサイエンスとML

Extracts structured text from images and scanned documents using multiple high-performance OCR engines.

The LlamaFarm OCR skill empowers Claude to convert visual data from images and scanned PDFs into actionable, machine-readable text directly within your development environment. By supporting specialized backends like Surya, PaddleOCR, and EasyOCR, it allows users to accurately process everything from standard business documents and CJK characters to handwritten notes and scene text. This skill streamlines document processing workflows by providing layout information and confidence scores, making it an essential tool for digitizing legacy data, indexing documents for RAG, or automating data entry tasks.

主な機能

01Seamless integration with downstream tasks like classification and entity extraction

02Processes common image formats and multi-page scanned PDF documents

03Extensive multi-language support covering 100+ languages including CJK and handwriting

04Supports multiple backends including Surya, EasyOCR, PaddleOCR, and Tesseract

05Detailed output including extraction confidence scores and layout metadata

060 GitHub stars

ユースケース

01Converting screenshots of terminal output or UI designs into editable text for documentation

02Extracting content from scanned research papers to build local RAG knowledge bases

03Digitizing paper invoices and receipts for automated data entry and accounting

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add llama-farm/claude-code-marketplace ocr

For use in Claude.ai and ChatGPT

Download Skill