Extracts clean, optimized markdown from URLs by stripping HTML bloat to maximize context window efficiency.
This skill provides a robust multi-layered strategy for retrieving web content in a clean, AI-friendly markdown format. By automatically removing navigation menus, scripts, and decorative elements through a series of fallback methods—including specialized proxy services and local extractors like Trafilatura—it significantly reduces token consumption. This ensures Claude receives only the essential text, improving the accuracy of summarization, analysis, and documentation referencing while preventing context window overflow.
主要功能
011 GitHub stars
02Local extraction fallbacks using Trafilatura, Pandoc, and Lynx
03Support for JavaScript-rendered pages via markdown.new and Jina proxies
04Automated browser user-agent simulation to bypass bot-blocking
05Five-stage fallback strategy for reliable content extraction
06Context window optimization by stripping non-essential HTML 'chrome'
使用场景
01Summarizing long-form articles or blog posts without wasting tokens on UI elements
02Extracting clean text from complex, script-heavy websites for data analysis
03Referencing online documentation or API guides during development sessions