Extracts clean, readable content from web articles and blog posts by removing ads, navigation menus, and distracting clutter.
The Article Extractor skill empowers Claude to convert cluttered web pages into high-quality, distraction-free text files. It employs a prioritized toolchain—starting with Mozilla Readability (reader) and Trafilatura—to strip away headers, sidebars, newsletter popups, and advertisements while preserving essential prose and headings. This skill is particularly useful for researchers and developers who need to archive web content, build local knowledge bases, or prepare clean datasets for further analysis without the noise of HTML boilerplate.
主な機能
012 GitHub stars
02Integrated verification and preview of extracted content
03Intelligent title extraction and filesystem-safe filename generation
04Prioritized extraction using specialized tools like reader and trafilatura
05Fallback mechanism using Python for environments with minimal dependencies
06Automatic removal of ads, navigation, and promotional sidebars
ユースケース
01Scraping news articles for clean text analysis or RAG pipelines
02Archiving web-based documentation for offline reference
03Converting online tutorials and blog posts into readable markdown or text files