What format is the scraped output?

The skill generates a structured directory containing a main SKILL.md, categorized markdown references (API, guides, etc.), and raw JSON page data.

Does it respect website crawling policies?

Yes, the skill includes a grounding checkpoint that verifies target URLs and checks robots.txt to identify rate-limiting requirements before execution begins.

What is the Documentation Scraper skill?

It is a specialized capability for Claude Code that automates the process of extracting content from documentation websites and organizing it into structured files used for building AI context and references.

Can I filter which parts of a site are scraped?

Yes, the configuration allows you to define include and exclude URL patterns, ensuring you only capture relevant documentation while skipping sections like blogs or changelogs.

What happens if a scrape is interrupted?

The skill includes a Recovery Protocol and checkpoint support, allowing you to resume interrupted scrapes from the last saved state without starting over.

Documentation Scraper

Name: Documentation Scraper
Author: jmagly

byjmagly

•

ウェブスクレイピングとデータ収集

Scrapes documentation websites and transforms them into organized, categorized reference files for AI context and offline archives.

The Documentation Scraper skill empowers developers to transform unstructured documentation websites into structured, AI-ready reference materials. By utilizing sophisticated selector mapping and URL pattern matching, it automates the extraction of guides, API references, and tutorials into categorized formats compatible with Claude Code and other agentic workflows. It includes robust grounding checks and error recovery protocols to handle rate limiting and complex site structures, ensuring high-quality context for autonomous coding tasks and custom skill development.

主な機能

01Smart URL pattern filtering to include specific guides and exclude irrelevant sections like blogs

02Configurable content selectors for precise scraping of titles, code blocks, and main content

03Integrated grounding checks for robots.txt compliance and rate-limiting safety

0467 GitHub stars

05Built-in recovery protocol to handle connection errors and selector mismatches automatically

06Automated extraction of web documentation into organized reference structures

ユースケース

01Converting online API documentation into structured offline markdown references

02Generating local context libraries for Claude Code to improve agentic coding accuracy

03Bootstrapping new Claude skills by scraping project-specific framework documentation

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add jmagly/ai-writing-guide doc-scraper

For use in Claude.ai and ChatGPT

Download Skill