Web Scraping & Data Collection Claude 技能

发现web scraping & data collection类别的 Claude 技能。浏览 17 个技能，找到适合您 AI 工作流程的完美功能。

Verified Research

Conducts deep-dive technical research by verifying actual source content across GitHub repositories and web documentation.

Financial Research Specialist

Analyzes SEC filings, earnings calls, and market data to extract deep corporate insights and financial narratives.

Investigative Journalism Researcher

Conducts deep investigative research and source verification for documentary-style creative projects and journalism.

YouTube Transcript

Extracts YouTube video transcripts, metadata, and chapters into formatted Markdown files for knowledge management systems.

Automated Document Hunter

Automates the systematic search, retrieval, and organization of primary source documents from free public archives using browser automation.

Official Government Source Researcher

Researches and extracts factual data from official US government agency statements, press releases, and litigation records.

Investigative Researcher

Performs journalism-grade investigative research using primary source analysis, triple-source verification, and evidence-chain mapping.

Biographical Researcher for Music

Conducts deep biographical research to extract humanizing details, quotes, and life trajectories for documentary-style music production.

Advanced Gemini Web Search

Executes comprehensive web searches using the Gemini command to gather real-time data and detailed information.

Gemini Web Search

Empowers Claude with real-time web search capabilities using the Google Gemini CLI to access up-to-date information and documentation.

Web Search Optimizer

Performs intelligent web searches using a prioritized MCP strategy to find the most relevant documentation and live technical data.

Defuddle Web Content Extractor

Extracts clean, clutter-free Markdown from web pages to optimize AI context and reduce token usage.

X Content Extraction and Script Generation

Extracts Twitter posts and comments to organize viewpoints and generate professional narration scripts for content production.

Washington Legislative Tracker

Tracks and analyzes Washington State K-12 education legislation using direct committee-based discovery and automated SOAP API queries.

YouTube Transcript Downloader

Extracts, downloads, and cleans YouTube video transcripts and captions for easy reading and analysis.

School Calendar Data Extractor

Extracts and structures school calendar dates from PDFs and websites to automate camp and childcare planning.

YouTube Video Downloader

Downloads YouTube videos and audio with customizable quality and format settings directly through Claude Code.

LangChain Deep Research

Conducts deep, iterative web research to generate comprehensive reports with verified citations and source tracking.

JSON Data Extraction

Extracts specific data from JSON files efficiently to minimize token usage and improve processing speed.

Scientific PDF Data Extraction

Extracts and validates structured data from scientific literature collections to create analysis-ready datasets for systematic reviews and meta-analyses.

Managing Fighter Images

Orchestrates a multi-source image pipeline to download, validate, and normalize fighter photos from Wikimedia, Sherdog, and Bing.

Scraping Data Pipeline

Orchestrates the extraction, validation, and database loading of comprehensive fighter data from UFCStats.com using Scrapy spiders.

DeepSeek OCR Tool

Converts batches of images and scanned documents into structured markdown files using local DeepSeek-OCR models via Ollama.

Wayback Machine Screenshot

Retrieves and manages historical visual snapshots of websites using the Internet Archive's Wayback Machine.

Wayback URL Archiver

Archives URLs to the Internet Archive's Wayback Machine for permanent digital preservation and snapshot tracking.

SearXNG Local Search

Deploys a local, privacy-respecting metasearch engine to aggregate web, package repository, and code results in structured JSON.

Wayback Oldest Archive Finder

Retrieves the earliest archived snapshot of any URL from the Wayback Machine to identify a website's original version.

GitHub User Explorer

Retrieves comprehensive GitHub user and organization profile data including repository counts, follower statistics, and account metadata.

Wayback Machine Archive Range

Retrieves and calculates the full historical archive span for any URL using the Wayback Machine.

Wayback Machine Newest Capture

Locates and retrieves the most recent archived version of any URL from the Internet Archive's Wayback Machine.

30 results loaded • More available

Scroll for more results...