How are tables handled during the extraction process?

The skill uses specialized table parsers like pdfplumber to detect visual grid patterns and convert them into clean Markdown tables, preserving the original data structure.

Does it support password-protected PDF files?

Yes, the skill allows you to provide a password during the extraction process to decrypt and access protected content.

Can this skill handle scanned PDFs that don't have selectable text?

Yes, the skill includes OCR (Optical Character Recognition) capabilities via Tesseract to extract text from scanned images and unsearchable PDF files.

Will it work with very large PDF documents?

Yes, the skill features a parallel processing mode and chunked extraction (saving progress every few pages) to prevent memory errors and ensure reliability for large files.

What happens if a PDF file is corrupted?

The skill includes a grounding checkpoint that runs metadata checks before processing; if a file is corrupt, it triggers a recovery protocol to attempt repairs using qpdf.

PDF Content Extractor

Name: PDF Content Extractor
Author: jmagly

byjmagly

•

Content Management

Extracts structured text, tables, and images from PDF documents to create searchable, organized Markdown content.

The PDF Extractor skill empowers Claude to autonomously parse complex PDF files, including text-based documents, scanned images requiring OCR, and multi-format tables. It provides a robust workflow for converting unstructured manuals, reports, and technical documentation into high-quality Markdown formats. By leveraging tools like pdfplumber and Tesseract, the skill ensures high-fidelity extraction while managing challenges like password protection, large file sizes through parallel processing, and metadata verification to prevent execution errors.

Key Features

01Structured output delivery including Markdown references and image assets

02Parallel processing and chunking for high-performance extraction of large PDFs

03Comprehensive metadata inspection and grounding checks

04High-fidelity text and table extraction using pdfplumber

0567 GitHub stars

06Integrated OCR support for scanned documents and images

Use Cases

01Extracting tabular data from financial reports for automated analysis

02Converting technical manuals into searchable documentation repositories

03Digitizing legacy scanned documents into structured agent-ready context

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add jmagly/ai-writing-guide pdf-extractor

For use in Claude.ai and ChatGPT

Download Skill