Can it extract structured data from APIs?

Yes, Strategy 4 allows the skill to intercept network requests (like GraphQL or REST calls) to read JSON payloads directly, providing cleaner data than DOM scraping.

How does Web Extractor handle virtual scrolling?

The skill identifies the specific scroll container, systematically moves through it in viewport-sized increments, and merges the captured text while deduplicating overlapping sections.

What should I do if WebFetch returns a 403 error?

Trigger the Web Extractor skill; it uses browser-based automation to render the page fully, which often bypasses basic bot detection and permission errors.

Can this skill scrape text from Unity or WebGL games?

Yes, it uses a specialized strategy involving mouse wheel scrolling and screenshot transcription to extract content that is rendered purely as pixels inside a canvas element.

Does it work on pages requiring authentication?

Yes, as long as you have the site open in a connected Chrome instance where you are authenticated, the skill can read the rendered DOM content directly.

Web Extractor

Name: Web Extractor
Author: Touricks

byTouricks

•

웹 스크래핑 및 데이터 수집

Extracts complete text content from complex, dynamically-loaded, and canvas-rendered web pages where standard tools fail.

The Web Extractor skill enables Claude to scrape and read content from modern web environments that standard HTTP fetches cannot access. It specializes in navigating Single Page Applications (SPAs), virtual scrolling documents, and hardware-accelerated canvas elements like Unity WebGL. By employing sophisticated strategies—including DOM analysis, scroll-and-capture loops, API interception, and visual transcription from screenshots—it ensures comprehensive data extraction from platforms like Google Docs, Notion, and Confluence, even when content is protected by authentication or lazy loading.

주요 기능

01Automated navigation and interaction to bypass authentication and lazy-loading barriers

02Visual text extraction for Unity WebGL, Unreal, and Canvas-based applications

03Comprehensive scraping for JS-rendered frameworks like React, Vue, and Angular

042 GitHub stars

05API interception to capture structured raw data from network requests

06Advanced handling of virtual scrolling containers to capture off-screen content

사용 사례

01Extracting text from game engine-based web builds or WebGL dashboards

02Capturing full content from infinite-scroll social feeds or news archives

03Scraping long-form internal documentation from Notion, Confluence, or Feishu

주요 기능

01Automated navigation and interaction to bypass authentication and lazy-loading barriers

02Visual text extraction for Unity WebGL, Unreal, and Canvas-based applications

03Comprehensive scraping for JS-rendered frameworks like React, Vue, and Angular

042 GitHub stars

05API interception to capture structured raw data from network requests

06Advanced handling of virtual scrolling containers to capture off-screen content

사용 사례

01Extracting text from game engine-based web builds or WebGL dashboards

02Capturing full content from infinite-scroll social feeds or news archives

03Scraping long-form internal documentation from Notion, Confluence, or Feishu