Can this skill handle scanned PDFs?

Yes, it includes guidance for using pytesseract and pdf2image to perform OCR on scanned documents to extract text from images.

Can I merge or split PDFs from the command line?

Yes, the skill provides specific commands for using high-performance CLI tools like qpdf and pdftk for rapid document manipulation without writing code.

How does it handle complex tables within PDFs?

It utilizes pdfplumber's advanced extraction features to identify table structures and convert them directly into structured formats like pandas DataFrames.

Which Python libraries are supported for PDF work?

The skill covers pypdf for basic operations, pdfplumber for precise table extraction, and reportlab for professional document creation.

Does it support PDF security and password protection?

Yes, the toolkit includes methods for encrypting, decrypting, and managing permissions for password-protected PDF files.

PDF Processing & Manipulation

Name: PDF Processing & Manipulation
Author: xiaxingxiaowei1983

byxiaxingxiaowei1983

0•

콘텐츠 관리

Provides a comprehensive toolkit for programmatic PDF extraction, creation, merging, and form handling using Python and CLI tools.

The PDF Processing skill empowers Claude to perform advanced document operations including text and table extraction, PDF generation, merging, splitting, and OCR for scanned documents. It provides optimized patterns for industry-standard libraries like pypdf, pdfplumber, and reportlab, as well as powerful command-line utilities like qpdf and poppler-utils. This skill is ideal for developers and data scientists needing to automate document workflows, process digital forms at scale, or transform unstructured PDF data into structured formats like pandas DataFrames or Excel files.

주요 기능

01Perform OCR on scanned PDFs using pytesseract and pdf2image

02Programmatically generate custom PDFs and reports using reportlab

030 GitHub stars

04Merge, split, rotate, and manage PDF pages using Python or CLI tools

05Handle advanced tasks like watermarking, encryption, and form filling

06Extract structured text and complex tabular data from PDF documents

사용 사례

01Building a backend service to generate dynamic PDF reports, certificates, or receipts

02Automating the extraction of financial data from PDF invoices into structured Excel sheets

03Batch processing document uploads to merge, split, or secure multiple PDF files

주요 기능

01Perform OCR on scanned PDFs using pytesseract and pdf2image

02Programmatically generate custom PDFs and reports using reportlab

030 GitHub stars

04Merge, split, rotate, and manage PDF pages using Python or CLI tools

05Handle advanced tasks like watermarking, encryption, and form filling

06Extract structured text and complex tabular data from PDF documents

사용 사례

01Building a backend service to generate dynamic PDF reports, certificates, or receipts

02Automating the extraction of financial data from PDF invoices into structured Excel sheets

03Batch processing document uploads to merge, split, or secure multiple PDF files