01Table detection and conversion to CSV/JSON/Markdown formats
02Section and heading identification for structural document mapping
03Support for OCR to process scanned image-based PDF documents
040 GitHub stars
05Automated summarization of key points and metadata extraction
06Multi-engine layout analysis using pdfplumber and tabula-py