Web Scraping & Data Collection Agent Skills

Discover Agent Skills for web scraping & data collection. Browse 17 skills for Claude, ChatGPT & Codex.

FireCrawl Installation & Authentication

Automates the installation and configuration of FireCrawl SDKs and API authentication for web scraping projects.

FireCrawl Cost Tuning

Optimizes FireCrawl operational costs through intelligent tier selection, usage monitoring, and budget-aware implementation strategies.

983

FireCrawl Core Workflow A

Automates the primary web crawling and data extraction process using the FireCrawl API to generate LLM-ready content.

983

Exa Core Secondary Workflow

Executes optimized secondary search and data retrieval tasks using the Exa API to complement primary research workflows.

983

FireCrawl Advanced Troubleshooting

Resolves complex FireCrawl errors using systematic evidence collection and deep-layer diagnostic techniques.

983

FireCrawl Reliability Patterns

Implements robust reliability patterns like circuit breakers, idempotency, and graceful degradation for production-grade FireCrawl integrations.

983

Exa Core Workflow A

Executes the primary integration workflow for the Exa search engine to implement core search and data retrieval features.

983

FireCrawl Core Workflow B

Executes secondary FireCrawl workflows to complement primary data collection and automated web scraping tasks.

983

FireCrawl Rate Limit Handler

Implements robust rate limiting, exponential backoff, and idempotency patterns for FireCrawl API integrations.

982

Crypto News Aggregator

Aggregates real-time cryptocurrency news from over 50 authoritative sources with advanced filtering and relevance scoring.

982

YouTube Transcript Extractor

Extracts and saves YouTube video subtitles or transcripts to local text files using command-line tools or automated browser interaction.

925

Z.AI Multimodal CLI

Integrates vision analysis, real-time web search, and GitHub exploration capabilities into Claude Code workflows.

896

ZAI CLI Integration

Enhances Claude with real-time web search, vision-based image analysis, and advanced GitHub repository exploration.

825

Vault Protocol Logo Extractor

Extracts and organizes brand logos for DeFi vault protocols by identifying homepage links and automating asset retrieval.

765

Canonical Event Deduplication

Normalizes and merges duplicate data from multiple sources using reputation scoring and semantic hash-based grouping.

585

Twitter Reader

Fetches Twitter/X post content and metadata into clean Markdown format using the Jina.ai API to bypass JavaScript restrictions.

384

Web to Markdown Converter

Transforms web pages into clean, readable Markdown files optimized for AI ingestion and local documentation.

347

Perplexity AI Search

Performs real-time AI web searches with citations using Perplexity models to provide up-to-date information and scientific literature.

324

Dev Opinions Scanner

Aggregates and synthesizes real-world developer perspectives from Hacker News, Reddit, and major technical communities.

313

Ark Research

Researches technical solutions and gathers cross-platform evidence to inform architecture and implementation decisions.

305

Reverse API Engineer

Transforms browser traffic into production-ready Python API clients through automated HAR analysis and code generation.

284

X Research

Conducts real-time agentic research and sentiment analysis across X/Twitter to gather developer insights and industry trends.

253

Recent Topic Research & Sentiment

Researches and synthesizes real-world community discussions from the last 30 days across Reddit, X, and the web.

253

Google Search & Web Access

Enables Claude to search the live web and fetch content from specific URLs to provide up-to-date information.

250

Tavily Web Search & Extraction

Equips Claude with high-performance web search capabilities and deep content extraction tools powered by the Tavily API.

247

arXiv Search & Metadata Retrieval

Automates the retrieval and normalization of academic paper metadata from arXiv to support research pipelines and literature reviews.

236

Reddit Data Connector

Extracts and analyzes Reddit content including posts, comments, subreddits, and user profiles using the public JSON API.

229

Tavily Web Search & Research

Empowers Claude with real-time web search, content extraction, and deep research capabilities using the Tavily API.

197

YouTube Transcript Extractor

Extracts text transcripts and captions from YouTube videos for content analysis, summarization, and documentation.

197

YouTube Transcript Extractor

Extracts and formats transcripts from YouTube videos using URLs or video IDs.

197

30 results loaded • More available

Scroll for more results...