Web Scraping & Data Collection Agent Skills

Discover Agent Skills for web scraping & data collection. Browse 17 skills for Claude, ChatGPT & Codex.

Academic Literature Sweep

Automates the discovery, extraction, and organization of academic literature for qualitative research and theoretical pattern identification.

Torrent Search & Download

Searches multiple torrent trackers and automates content downloading via magnet links and WebTorrent.

Web Fetch via Gemini CLI

Extracts web page content into clean Markdown using the Gemini CLI as a robust alternative to native browsing tools.

AI Web Research & Scraping

Conducts real-time web searches, deep multi-source research, and high-fidelity page scraping using Perplexity and Firecrawl.

Gemini Deep Research

Executes autonomous multi-step research and information synthesis using the Google Gemini Deep Research Agent.

Web Content Fetcher via Gemini

Fetches and converts web page content into clean Markdown using the Gemini CLI as a robust alternative to native browsing tools.

Article Extractor

Extracts clean, readable content from web articles and blog posts by removing ads, navigation menus, and distracting clutter.

Video Downloader

Downloads high-quality videos and audio from YouTube and other platforms for offline viewing, editing, or archival.

YouTube Transcript Downloader

Fetches and cleans transcripts from YouTube videos using yt-dlp with optional Whisper transcription fallback.

Wayback Machine Checker

Checks the archival status and availability of URLs within the Internet Archive's Wayback Machine.

Deep Research Automation

Automates systematic, multi-agent research workflows to generate validated, structured JSON data from web sources.

Web to Markdown Converter

Converts JavaScript-rendered web pages into clean, readable Markdown files using Puppeteer and the Readability algorithm.

Retail Scraper Debugger & Fixer

Diagnoses and resolves web scraping failures for precious metal retail vendors using Firecrawl and Playwright diagnostics.

SearXNG Local Search

Deploys a local, privacy-respecting metasearch engine to aggregate web, package repository, and code results in structured JSON.

Wayback Machine Explorer

Lists and manages archived snapshots from the Wayback Machine to track website history and recover lost content.

MQL5 Article Extractor

Extracts and organizes technical trading articles and documentation from mql5.com for research and training data collection.

Video Downloader

Downloads high-quality videos and audio from YouTube and other platforms for offline access and archival.

VeriGlow Agent Map

Discovers hidden APIs, browser automation recipes, and structured data extraction patterns for any website to streamline AI agent interactions.

Web Extractor

Extracts complete text content from complex, dynamically-loaded, and canvas-rendered web pages where standard tools fail.

VeriGlow Agent Map

Discovers hidden APIs, structured data functions, and browser automation recipes to streamline web scraping and data extraction for AI agents.

Extruct AI Company Research

Automates company discovery, market research, and lead enrichment using Extruct AI's semantic and Deep Search capabilities.

SEC Filing & Corporate Researcher

Conducts deep-dive research into SEC EDGAR filings to extract financial data, officer information, and risk factor analysis.

Wayback Cache Management

Manages local API response caching for Wayback Machine operations to optimize performance and ensure data freshness.

GitHub User Explorer

Retrieves comprehensive GitHub user and organization profile data including repository counts, follower statistics, and account metadata.

Wayback Machine Newest Capture

Locates and retrieves the most recent archived version of any URL from the Internet Archive's Wayback Machine.

Wayback URL Archiver

Archives URLs to the Internet Archive's Wayback Machine for permanent digital preservation and snapshot tracking.

Zighang AI Job Search

Automates the collection and organization of AI and data-related job listings from Zighang into Obsidian-compatible markdown.

Zighang Job Scraper

Automates the collection of bookmarked job postings from Zighang and synchronizes them into Obsidian as structured Markdown files.

Wayback Machine Screenshot

Retrieves and manages historical visual snapshots of websites using the Internet Archive's Wayback Machine.

Web Fetch via Gemini CLI

Fetches and converts web content into clean Markdown using the Gemini CLI as a reliable fallback for Claude's native tools.

30 results loaded • More available

Scroll for more results...