Can it handle sites with heavy JavaScript or paywalls?

It may struggle with JavaScript-only SPAs or authenticated content; in such cases, the skill will explicitly notify you rather than saving empty or broken output.

Does it remove comments and social media buttons?

Yes, the skill is configured to suppress comments, social sharing widgets, and other non-article elements to ensure the output is pure content.

How does it name the saved files?

It automatically extracts the article's title, sanitizes it by removing illegal characters, and saves it as a .txt file for easy identification.

What tools does the Article Extractor use?

The skill prioritizes specialized extraction tools like 'reader' (Mozilla Readability) and 'trafilatura', falling back to a custom Python script if these are not installed.

Article Extractor

Name: Article Extractor
Author: I-Onlabs

byI-Onlabs

•

ウェブスクレイピングとデータ収集

Extracts clean, readable content from web articles and blog posts by removing ads, navigation menus, and distracting clutter.

The Article Extractor skill empowers Claude to convert cluttered web pages into high-quality, distraction-free text files. It employs a prioritized toolchain—starting with Mozilla Readability (reader) and Trafilatura—to strip away headers, sidebars, newsletter popups, and advertisements while preserving essential prose and headings. This skill is particularly useful for researchers and developers who need to archive web content, build local knowledge bases, or prepare clean datasets for further analysis without the noise of HTML boilerplate.

主な機能

012 GitHub stars

02Integrated verification and preview of extracted content

03Intelligent title extraction and filesystem-safe filename generation

04Prioritized extraction using specialized tools like reader and trafilatura

05Fallback mechanism using Python for environments with minimal dependencies

06Automatic removal of ads, navigation, and promotional sidebars

ユースケース

01Scraping news articles for clean text analysis or RAG pipelines

02Archiving web-based documentation for offline reference

03Converting online tutorials and blog posts into readable markdown or text files

主な機能

012 GitHub stars

02Integrated verification and preview of extracted content

03Intelligent title extraction and filesystem-safe filename generation

04Prioritized extraction using specialized tools like reader and trafilatura

05Fallback mechanism using Python for environments with minimal dependencies

06Automatic removal of ads, navigation, and promotional sidebars

ユースケース

01Scraping news articles for clean text analysis or RAG pipelines

02Archiving web-based documentation for offline reference

03Converting online tutorials and blog posts into readable markdown or text files