Extracts clean, focused documentation from websites for human readers and LLM consumption.
This Python toolkit streamlines documentation management by extracting clean, focused content from websites. It offers multiple crawling strategies (single page, multi-page, sitemap-based, and menu-based) to efficiently gather documentation. The extracted content is formatted as clean Markdown and structured JSON, making it suitable for documentation sites, wikis, knowledge bases, LLM training, and RAG systems. By stripping away irrelevant elements like navigation menus and ads, it provides a ready-to-use documentation source for various applications.