Web Scraping & Data Collection MCP Servers

Discover our curated collection of MCP servers for web scraping & data collection. Browse 1226servers and find the perfect MCPs for your needs.

A-Stock Data icon

A-Stock Data

277

Provides A-share (China stock market) data to large language models via the Model Content Protocol (MCP).

Web Research icon

Web Research

274

Enables Claude to access real-time information from the web for enhanced research capabilities.

Airbnb icon

Airbnb

265

Searches Airbnb listings and retrieves detailed listing information.

Open Web Search icon

Open Web Search

257

Enables web search across multiple engines without requiring API keys, supporting Bing, Baidu, DuckDuckGo, Brave, Exa, and CSDN.

Deep Research icon

Deep Research

256

Conducts in-depth, iterative research on any topic using AI-powered search, web scraping, and source evaluation to generate comprehensive reports.

Douyin icon

Douyin

255

Extracts watermark-free video links, video captions, and audio transcriptions from Douyin (TikTok) share links.

Playwright icon

Playwright

245

Enables browser automation capabilities using Playwright for LLMs to interact with web pages.

Nodemw icon

Nodemw

241

Provides a Node.js client for interacting with the MediaWiki API and WikiData.

Selenium icon

Selenium

241

Automates browser interactions through the Model Context Protocol using Selenium WebDriver.

GPT Researcher icon

GPT Researcher

226

Enables LLM applications to perform in-depth research through the MCP protocol.

12306 icon

12306

225

Provides a high-performance backend system for querying China Railway 12306 train ticket information using the Model Context Protocol (MCP).

PDF Reader icon

PDF Reader

218

Securely reads and extracts text, metadata, and page counts from PDF files (local or URL) for use by AI agents.

CoexistAI icon

CoexistAI

213

Automates and simplifies diverse research workflows by integrating large language models with web search, social media, mapping, and code exploration.

Puppeteer icon

Puppeteer

208

Automates browser interactions through Puppeteer for both new and existing Chrome instances.

Reddit Content icon

Reddit Content

207

Fetches and analyzes content from Reddit, providing access to hot threads and post details.

G-Search icon

G-Search

196

Enables parallel Google searches with multiple keywords using a Playwright-powered MCP server.

Paper Search icon

Paper Search

190

Searches and downloads academic papers from multiple sources like arXiv, PubMed, and bioRxiv.

MCPBench icon

MCPBench

182

Evaluates the performance of MCP servers for web search and database query tasks.

SearXNG icon

SearXNG

180

Integrates the SearXNG API to provide web search capabilities within an MCP environment.

Rag Web Browser icon

Rag Web Browser

179

Enables AI agents and LLMs to interact with the web and extract information from web pages via the RAG Web Browser Actor.

Showing 20 of 1226 results

Scroll for more results...