01LLM-optimized Markdown output for preserving document structure
02Native integration with Mistral OCR API for complex and scanned layouts
03Multi-engine support including PyMuPDF, pdfplumber, and Tesseract
04Specialized handling for tables, math formulas, and multilingual text
05Automated decision guide to select the best extraction tool based on PDF type
060 GitHub stars