Web Scraping & Data Collection Claude 技能

发现web scraping & data collection类别的 Claude 技能。浏览 17 个技能，找到适合您 AI 工作流程的完美功能。

Academic Literature Sweep

Automates the discovery, extraction, and organization of academic literature for qualitative research and theoretical pattern identification.

Academic Literature Sweep

Automates the discovery, extraction, and organization of academic literature for qualitative research and theoretical pattern extraction.

Torrent Search & Download

Searches multiple torrent trackers and automates content downloading via magnet links and WebTorrent.

Qualitative Literature Orchestrator

Automates the discovery, retrieval, and organization of academic literature for qualitative research and theoretical pattern extraction.

Ethical Web Scraping & Data Extraction

Implements ethical, resilient, and legally compliant web scraping strategies to extract high-quality data while avoiding bot detection.

Efficient Web Scraping

Optimizes data extraction from websites and APIs using specialized Python scripts to maximize performance and minimize token consumption.

Gemini Deep Research

Executes autonomous multi-step research and information synthesis using the Google Gemini Deep Research Agent.

AT Protocol Data Ingest

Extracts and ingests social graph data and content from the AT Protocol and Bluesky into structured formats.

DeepSeek OCR Tool

Converts batches of images and scanned documents into structured markdown files using local DeepSeek-OCR models via Ollama.

Video Downloader

Downloads high-quality videos and audio from YouTube and other platforms for offline access and archival.

Deep Research Professional

Conducts comprehensive market intelligence, company analysis, and competitive research using structured methodologies and automated data collection.

AT Protocol Data Ingest

Orchestrates large-scale data acquisition and ingestion from the Bluesky/AT Protocol social graph for downstream analysis.

Wayback Machine Screenshot

Retrieves and manages historical visual snapshots of websites using the Internet Archive's Wayback Machine.

Exa Semantic Search

Performs neural, context-aware web searches and deep research tasks to find high-quality information that keyword matching misses.

MQL5 Article Extractor

Extracts and organizes technical trading articles and documentation from mql5.com for research and training data collection.

AI News Crawler & Summarizer

Crawls global AI news sources to generate deduplicated, Chinese-language summaries in a structured JSON format.

Web to Markdown Converter

Converts JavaScript-rendered web pages into clean, readable Markdown files using Puppeteer and the Readability algorithm.

Scraping Data Pipeline

Orchestrates the extraction, validation, and database loading of comprehensive fighter data from UFCStats.com using Scrapy spiders.

Video Transcript & Media Downloader

Downloads videos, extracts high-quality audio, and generates clean, paragraph-style transcripts from YouTube and other media platforms.

PDF Smart Extractor

Extracts and analyzes large PDF documents locally with semantic chunking to minimize token usage and maximize context efficiency.

Google Custom Search CLI

Integrates Google Programmable Search Engine capabilities directly into Claude Code for programmatic web and image retrieval.

URL to Markdown Converter

Converts any webpage into clean, formatted Markdown using Chrome CDP for full JavaScript rendering and metadata extraction.

Borrow Media Search & Download

Searches for media and automates torrent downloads across multiple sources using a local API.

GitHub User Explorer

Retrieves comprehensive GitHub user and organization profile data including repository counts, follower statistics, and account metadata.

Deep Research Automation

Automates systematic, multi-agent research workflows to generate validated, structured JSON data from web sources.

Competitive Ads Extractor

Extracts and analyzes competitor advertisements from ad libraries to uncover winning messaging, pain points, and creative strategies.

Website Content Extractor

Extracts clean, clutter-free article and blog content from URLs by stripping away ads, navigation, and unnecessary UI elements.

Content Crawler

Extracts high-speed, read-only markdown content from documentation, blogs, and static websites.

Firecrawl Web Scraper

Extracts deep web content, captures screenshots, and parses PDFs using the powerful Firecrawl API.

Web Research Assistant

Orchestrates parallel subagents to perform structured, multi-source web investigations and synthesize findings into comprehensive reports.

30 results loaded • More available

Scroll for more results...