Does it work on sites that require JavaScript?

Yes, the integration with Jina Reader and markdown.new proxies allows for fetching content from many JavaScript-rendered pages.

Can I use this for password-protected content?

No, this skill is designed for public URLs. Content behind authentication generally requires specialized browser tools or active session management.

Is Trafilatura required to use this skill?

Trafilatura is highly recommended for the best local extraction quality, but the skill includes several other fallbacks that work with standard system tools like curl and lynx.

Why use this instead of the default WebFetch?

Standard WebFetch often includes bloated HTML, navigation, and scripts; this skill strips that 'noise' to save tokens and focus the AI on the actual content.

What happens if a website blocks the proxies?

The skill automatically falls back to local tools like Trafilatura, Pandoc, or Lynx to attempt content extraction directly from your environment.

Markdown Web Fetcher

Name: Markdown Web Fetcher
Author: jackwillis

byjackwillis

•

网络抓取与数据收集

Extracts clean, optimized markdown from URLs by stripping HTML bloat to maximize context window efficiency.

This skill provides a robust multi-layered strategy for retrieving web content in a clean, AI-friendly markdown format. By automatically removing navigation menus, scripts, and decorative elements through a series of fallback methods—including specialized proxy services and local extractors like Trafilatura—it significantly reduces token consumption. This ensures Claude receives only the essential text, improving the accuracy of summarization, analysis, and documentation referencing while preventing context window overflow.

主要功能

011 GitHub stars

02Local extraction fallbacks using Trafilatura, Pandoc, and Lynx

03Support for JavaScript-rendered pages via markdown.new and Jina proxies

04Automated browser user-agent simulation to bypass bot-blocking

05Five-stage fallback strategy for reliable content extraction

06Context window optimization by stripping non-essential HTML 'chrome'

使用场景

01Summarizing long-form articles or blog posts without wasting tokens on UI elements

02Extracting clean text from complex, script-heavy websites for data analysis

03Referencing online documentation or API guides during development sessions

主要功能

011 GitHub stars

02Local extraction fallbacks using Trafilatura, Pandoc, and Lynx

03Support for JavaScript-rendered pages via markdown.new and Jina proxies

04Automated browser user-agent simulation to bypass bot-blocking

05Five-stage fallback strategy for reliable content extraction

06Context window optimization by stripping non-essential HTML 'chrome'

使用场景

01Summarizing long-form articles or blog posts without wasting tokens on UI elements

02Extracting clean text from complex, script-heavy websites for data analysis

03Referencing online documentation or API guides during development sessions