Do I need an API key to use this skill?

An API key is only required if you use the optional '--summarize' flag to generate AI summaries via Gemini.

Where does the skill save the scraped content?

By default, the extracted markdown files are saved to the 'output/scraped/' directory within your project.

Does it preserve code blocks and technical formatting?

Yes, the scraper is designed to preserve headings, links, lists, and code blocks while removing non-essential layout elements.

Can I scrape multiple websites at the same time?

Yes, you can provide a text file containing a list of URLs (one per line) using the '--file' flag for batch processing.

What metadata is included in the output?

Every markdown file includes YAML frontmatter containing the title, author, source URL, and word count.

Web to Markdown Scraper

Name: Web to Markdown Scraper
Author: cdeistopened

bycdeistopened

•

Web Scraping & Data Collection

Extracts clean markdown content from web pages with optional AI-powered summarization and metadata extraction.

This skill provides a robust solution for converting noisy web pages into structured, readable markdown files. It intelligently identifies core content—such as articles and main sections—while stripping away distractions like navigation menus, ads, and footers. Ideal for researchers and developers, it supports both single-page and batch processing, generating YAML frontmatter for every file. With an optional AI integration, it can even prepend concise summaries and topic tags, making it a perfect tool for building knowledge bases or preparing datasets for LLM context.

Key Features

01Generates AI-powered summaries and topic tags via Gemini API

028 GitHub stars

03Converts complex HTML into clean, formatted markdown

04Strips advertisements, navigation, scripts, and footers automatically

05Extracts rich metadata including Open Graph tags and word counts

06Supports batch processing of multiple URLs from a single file

Use Cases

01Building a personal research library or local knowledge base from online sources

02Archiving blog posts and documentation for offline reading or RAG workflows

03Summarizing large lists of articles for rapid information synthesis

Key Features

01Generates AI-powered summaries and topic tags via Gemini API

028 GitHub stars

03Converts complex HTML into clean, formatted markdown

04Strips advertisements, navigation, scripts, and footers automatically

05Extracts rich metadata including Open Graph tags and word counts

06Supports batch processing of multiple URLs from a single file

Use Cases

01Building a personal research library or local knowledge base from online sources

02Archiving blog posts and documentation for offline reading or RAG workflows

03Summarizing large lists of articles for rapid information synthesis