01Ingests PDFs, images, and text files with automatic page classification and OCR via AWS Textract
020 GitHub stars
03Provides a command-line interface (CLI) for direct interaction, document ingestion, and oversight
04Performs local vector embeddings using snowflake-arctic-embed-m-v1.5 for semantic search
05Utilizes LanceDB for fast, local vector storage without requiring an external database
06Exposes a comprehensive MCP server with tools for semantic search, ingestion, and document management