01Granular extraction of specific page ranges or individual pages
020 GitHub stars
03OCR support for scanned documents and image-based PDFs using Tesseract
04Seamless integration with Obsidian and other markdown-based note-taking workflows
05Structural conversion of DOCX files into clean, usable Markdown format
06Large file processing with page-by-page extraction and chunk splitting