01Supports 15+ input formats including PDF, DOCX, PPTX, and various image types
02Integrated OCR support with EasyOCR, Tesseract, and RapidOCR engines
03Preserves hierarchical document structure for better context awareness
0414 GitHub stars
05Advanced table extraction with cell-matching and accuracy modes
06Built-in hierarchical and hybrid chunking for RAG pipeline integration