Leverages Apache Tika to extract content and metadata from various local file formats (PDF, DOCX, TXT) into HTML or plain text.
The Tika Extractor is a Model Context Protocol (MCP) compliant server that utilizes Apache Tika to extract content and metadata from a wide range of file formats, including PDF, DOCX, and TXT, stored in a local directory. It offers conversion to HTML with optional CSS styling for enhanced readability or plain text, alongside capabilities to list available files and retrieve detailed metadata. Built with Java, Spring Boot, and Jetty, it seamlessly integrates with MCP-compliant clients and provides convenient REST endpoints for testing and direct HTML rendering, making it suitable for secure, offline document processing workflows.