This skill provides a comprehensive toolkit for converting PDF documents into structured formats optimized for large language models (LLMs). By integrating Docling for AI-powered structure preservation, PyMuPDF for high-speed processing, and pdfplumber for maximum fidelity, it allows users to transform academic papers, research documents, and complex reports into markdown with headers, tables, and lists intact. All processing is performed entirely on-device to ensure data privacy, making it an essential utility for RAG system preparation, batch data processing, and local document analysis within the Claude Code environment.
Key Features
01Lossless text extraction with pdfplumber for maximum fidelity
02Privacy-first local processing with no external API calls or data leaks
03Standardized markdown output optimized for LLM context windows
045 GitHub stars
05AI-powered structure preservation using Docling for headers and tables
06High-speed batch processing via PyMuPDF (up to 60x faster than alternatives)