PDFtotext icon

PDFtotext

Extracts text from PDF documents using the robust `pdftotext` utility, designed for reliable integration with Model Context Protocol servers.

概要

PDFtotext provides a reliable and lightweight Model Context Protocol (MCP) server for extracting text from PDF documents. Unlike other PDF MCP servers, it focuses on clean JSON-RPC communication without stdout pollution, ensuring consistent and predictable results. Built upon the mature `pdftotext` utility from poppler-utils, it offers features such as extracting text from entire documents or specific pages, preserving original layout formatting, supporting multiple text encodings, and providing comprehensive metadata. It includes robust file validation, security checks, and detailed error reporting, making it production-tested and feature-complete for various MCP client environments.

主な機能

  • Includes robust file validation, security checks, and detailed error reporting
  • 0 GitHub stars
  • Preserves original layout formatting and supports multiple text encodings
  • Extracts text from entire PDF documents or specific pages
  • Provides comprehensive metadata in responses, including word count and file info
  • Ensures reliable and clean JSON-RPC communication without stdout pollution

ユースケース

  • Integrating PDF text extraction capabilities into AI applications like Claude Desktop
  • Programmatically extracting structured or raw text from PDF documents for analysis
  • Automating content processing workflows that require text content from PDFs
Advertisement

Advertisement