Docs Scraper icon

Docs Scraper

1

Extracts clean, focused documentation from websites for human readers and LLM consumption.

Acerca de

This Python toolkit streamlines documentation management by extracting clean, focused content from websites. It offers multiple crawling strategies (single page, multi-page, sitemap-based, and menu-based) to efficiently gather documentation. The extracted content is formatted as clean Markdown and structured JSON, making it suitable for documentation sites, wikis, knowledge bases, LLM training, and RAG systems. By stripping away irrelevant elements like navigation menus and ads, it provides a ready-to-use documentation source for various applications.

Características Principales

  • Handles dynamic content and lazy-loaded elements
  • 1 GitHub stars
  • Provides colorful terminal feedback for status and errors
  • Automatically identifies main content areas and removes irrelevant sections
  • Offers multiple crawling strategies (single URL, multi-URL, sitemap, and menu-based)
  • Outputs clean Markdown and structured JSON

Casos de Uso

  • Preparing documentation for LLM training and RAG systems
  • Creating documentation sites and wikis
  • Building knowledge bases from dependency documentation