Docs Scraper icon

Docs Scraper

Createdfelores

Extracts clean, focused documentation from websites for human readers and LLM consumption.

About

This Python toolkit streamlines documentation management by extracting clean, focused content from websites. It offers multiple crawling strategies (single page, multi-page, sitemap-based, and menu-based) to efficiently gather documentation. The extracted content is formatted as clean Markdown and structured JSON, making it suitable for documentation sites, wikis, knowledge bases, LLM training, and RAG systems. By stripping away irrelevant elements like navigation menus and ads, it provides a ready-to-use documentation source for various applications.

Key Features

  • Handles dynamic content and lazy-loaded elements
  • 1 GitHub stars
  • Provides colorful terminal feedback for status and errors
  • Automatically identifies main content areas and removes irrelevant sections
  • Offers multiple crawling strategies (single URL, multi-URL, sitemap, and menu-based)
  • Outputs clean Markdown and structured JSON

Use Cases

  • Preparing documentation for LLM training and RAG systems
  • Creating documentation sites and wikis
  • Building knowledge bases from dependency documentation
Craft Better Prompts with AnyPrompt