How does this skill handle large PDF files?

It uses a 'Probe First' strategy to check page counts and content types, then employs chunked text extraction to avoid Claude's 100-image context limit.

Can I use it to create new PDFs from scratch?

Absolutely. It includes support for the reportlab library, allowing you to generate reports, letters, and multi-page documents programmatically.

Does it support scanned documents?

Yes, the skill includes a path for OCR processing using pytesseract and pdf2image when standard text extraction returns empty results.

Can it extract data from tables within a PDF?

Yes, it utilizes the pdfplumber library to extract tables and can even convert them into Pandas DataFrames for further analysis.

How do I avoid context budget issues when reading PDFs?

The skill provides a probe script to determine the best reading strategy, often recommending text extraction over image-based reading for dense documents.

PDF Toolkit for Claude

Name: PDF Toolkit for Claude
Author: Touricks

byTouricks

•

コンテンツ管理

Processes, generates, and analyzes PDF documents efficiently while managing context limits through smart reading strategies.

This skill provides a comprehensive suite of PDF manipulation tools designed to overcome the context limits of AI agents. It enables Claude to programmatically extract text and tables using pdfplumber, merge or split documents with pypdf, and generate high-quality PDFs using reportlab. A critical feature is its 'Smart Reading' logic, which probes files to prevent context overflow by using chunked text extraction instead of image-based reading, making it ideal for processing everything from single-page forms to massive 150+ page technical reports.

主な機能

012 GitHub stars

02OCR support for scanned documents via pytesseract and pdf2image

03Smart Reading logic to prevent token overflow and image limit failures

04Document manipulation including merging, splitting, rotating, and encrypting

05High-fidelity text and table extraction using pdfplumber and pdftotext

06Programmatic PDF generation and report creation with reportlab

ユースケース

01Analyzing large technical manuals or legal documents without hitting API limits

02Automating the extraction of complex data from PDF tables into DataFrames

03Generating automated business reports, invoices, or watermarked documents

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add touricks/fanshi_personal_skills pdf

For use in Claude.ai and ChatGPT

主な機能

012 GitHub stars

02OCR support for scanned documents via pytesseract and pdf2image

03Smart Reading logic to prevent token overflow and image limit failures

04Document manipulation including merging, splitting, rotating, and encrypting

05High-fidelity text and table extraction using pdfplumber and pdftotext

06Programmatic PDF generation and report creation with reportlab

ユースケース

01Analyzing large technical manuals or legal documents without hitting API limits

02Automating the extraction of complex data from PDF tables into DataFrames

03Generating automated business reports, invoices, or watermarked documents

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add touricks/fanshi_personal_skills pdf

For use in Claude.ai and ChatGPT