01Seamless integration with downstream tasks like classification and entity extraction
02Processes common image formats and multi-page scanned PDF documents
03Extensive multi-language support covering 100+ languages including CJK and handwriting
04Supports multiple backends including Surya, EasyOCR, PaddleOCR, and Tesseract
05Detailed output including extraction confidence scores and layout metadata
060 GitHub stars