Extracts clean markdown content from web pages with optional AI-powered summarization and metadata extraction.
This skill provides a robust solution for converting noisy web pages into structured, readable markdown files. It intelligently identifies core content—such as articles and main sections—while stripping away distractions like navigation menus, ads, and footers. Ideal for researchers and developers, it supports both single-page and batch processing, generating YAML frontmatter for every file. With an optional AI integration, it can even prepend concise summaries and topic tags, making it a perfect tool for building knowledge bases or preparing datasets for LLM context.
Key Features
01Generates AI-powered summaries and topic tags via Gemini API
028 GitHub stars
03Converts complex HTML into clean, formatted markdown
04Strips advertisements, navigation, scripts, and footers automatically
05Extracts rich metadata including Open Graph tags and word counts
06Supports batch processing of multiple URLs from a single file
Use Cases
01Building a personal research library or local knowledge base from online sources
02Archiving blog posts and documentation for offline reading or RAG workflows
03Summarizing large lists of articles for rapid information synthesis