MinerU
Createdopendatalab
Extracts data from PDF documents, converting them into Markdown and JSON formats.
About
MinerU is a high-quality, open-source tool designed for comprehensive data extraction from PDF documents. It converts PDFs into easily digestible Markdown and JSON formats, facilitating document analysis, data mining, and integration with various applications. MinerU supports layout analysis, OCR, and offers pre-trained and LLM-based extraction capabilities, making it a versatile solution for various document processing needs.
Key Features
- Converts PDFs to Markdown format
- Layout analysis
- Pre-trained and LLM-based extraction
- Converts PDFs to JSON format
- OCR support
- 30,355 GitHub stars
Use Cases
- Converting legal documents into structured data
- Automated data extraction from research papers
- Processing financial reports for analysis