Does this skill support scanned PDFs?

Yes, it includes OCR support using tools like Tesseract to extract text from scanned images and low-quality PDF documents.

How does it handle large PDF documents?

It utilizes a chunked processing strategy with internal checkpointing, saving progress per page range to ensure stability during long-running tasks.

Can it extract tables into Markdown format?

Yes, the skill is optimized to detect visual grid patterns and convert them into structured Markdown tables that are easy for AI models to interpret.

Can it handle password-protected files?

Yes, the skill can process encrypted PDFs if the user provides the correct password when prompted during the extraction strategy phase.

PDF Content Extractor

Name: PDF Content Extractor
Author: jmagly

byjmagly

•

内容管理

Converts PDF documentation, manuals, and reports into organized, searchable text, tables, and images for agentic workflows.

The PDF Extractor skill enables Claude to autonomously parse and structure content from complex PDF documents, including text-based, scanned, and password-protected files. By implementing robust grounding checks and uncertainty escalation, it ensures high-quality extraction of tables, images, and formatted text. This skill is specifically designed to mitigate common LLM failure modes when handling large documents, making it an essential tool for converting legacy documentation into AI-ready, structured formats with full checkpoint support and error recovery.

主要功能

01Checkpoint support for large document processing with resume capabilities

02Structured extraction of complex tables and visual grid patterns

03High-resolution image extraction with page-relative asset mapping

04Robust error recovery for corrupted files and password-protected documents

0567 GitHub stars

06Automated detection of text-based vs. scanned (OCR) PDF formats

使用场景

01Ingesting legacy PDF documentation into a structured knowledge base for RAG systems

02Converting technical PDF manuals into Markdown documentation for AI context stacks

03Extracting tabular data from financial reports for automated analysis

主要功能

01Checkpoint support for large document processing with resume capabilities

02Structured extraction of complex tables and visual grid patterns

03High-resolution image extraction with page-relative asset mapping

04Robust error recovery for corrupted files and password-protected documents

0567 GitHub stars

06Automated detection of text-based vs. scanned (OCR) PDF formats

使用场景

01Ingesting legacy PDF documentation into a structured knowledge base for RAG systems

02Converting technical PDF manuals into Markdown documentation for AI context stacks

03Extracting tabular data from financial reports for automated analysis