Provides a Retrieval-Augmented Generation (RAG) database for PDF documents, leveraging semantic chunking and vector search for intelligent content retrieval.
This tool operates as a Model Context Protocol (MCP) server, transforming PDF documents into a powerful RAG database. It intelligently processes PDFs using advanced semantic chunking, converting text into high-quality embeddings with `multi-qa-mpnet-base-dot-v1` and storing them persistently in ChromaDB. Users can perform both semantic similarity searches and traditional keyword searches across their document collection. Key functionalities include automatic OCR support for scanned PDFs, tracking document names and page numbers for retrieved chunks, and flexible management of the document collection. It seamlessly integrates with MCP clients like Claude Desktop, offering a robust solution for research, documentation Q&A, and comprehensive knowledge base management.