Provides a Retrieval-Augmented Generation (RAG) database for PDF documents, leveraging semantic chunking and vector search for intelligent content retrieval.
Sponsored
This tool operates as a Model Context Protocol (MCP) server, transforming PDF documents into a powerful RAG database. It intelligently processes PDFs using advanced semantic chunking, converting text into high-quality embeddings with `multi-qa-mpnet-base-dot-v1` and storing them persistently in ChromaDB. Users can perform both semantic similarity searches and traditional keyword searches across their document collection. Key functionalities include automatic OCR support for scanned PDFs, tracking document names and page numbers for retrieved chunks, and flexible management of the document collection. It seamlessly integrates with MCP clients like Claude Desktop, offering a robust solution for research, documentation Q&A, and comprehensive knowledge base management.
主要功能
01Source Tracking for document names and page numbers
020 GitHub stars
03OCR Support for scanned and image-based PDFs
04Keyword Search for exact term matching
05Vector Search for semantically similar content
06Semantic Chunking for intelligent text segmentation
使用案例
01Building a research database from academic papers and articles
02Managing and searching a comprehensive knowledge base
03Performing question-answering over documentation and manuals