Docling icon

Docling

Simplifies document processing and parsing of diverse formats for seamless integration with the gen AI ecosystem.

About

Docling is a versatile tool designed to streamline document processing for generative AI applications. It expertly parses various document formats, including PDFs, DOCX, XLSX, and HTML, offering advanced PDF understanding with features like page layout analysis, reading order detection, and table structure recognition. Docling provides a unified document representation and supports multiple export formats, ensuring compatibility and ease of use within different AI workflows. Its local execution capabilities enhance security, and its integrations with frameworks like LangChain and LlamaIndex accelerate AI application development.

Key Features

  • Offers advanced PDF understanding including page layout, reading order, and table structure.
  • 27,053 GitHub stars
  • Provides a unified DoclingDocument representation format.
  • Parses multiple document formats including PDF, DOCX, XLSX, HTML, and images.
  • Supports various export formats including Markdown, HTML, and JSON.
  • Integrates with LangChain, LlamaIndex, Crew AI & Haystack for agentic AI.

Use Cases

  • Preparing documents for use in Large Language Models (LLMs).
  • Automating document parsing and data extraction workflows.
  • Building AI applications that require processing information from various document types.
Craft Better Prompts with AnyPrompt