Web Scrapper
byJustAzul
0Extracts main content from web pages as Markdown, text, or HTML via stdio/JSON-RPC for AI and automation tools.
About
The Web Scrapper tool is a robust, Python-based headless web scraping service designed for seamless integration with AI and automation workflows. It leverages Playwright, BeautifulSoup, and Markdownify to extract the primary content from web pages, outputting it in Markdown, plain text, or HTML formats. Operating as an MCP (Model Context Protocol) server via stdio/JSON-RPC, it's ideal for enhancing AI models, IDEs, and other automation platforms with real-time web content.
Key Features
- Headless browser scraping (Playwright, BeautifulSoup, Markdownify)
- 0 GitHub stars
- Outputs content in Markdown, text, or HTML formats
- Dockerized with pre-built images for easy deployment
- Robust error handling for timeouts, HTTP errors, and Cloudflare challenges
- Designed for MCP (Model Context Protocol) stdio/JSON-RPC integration
Use Cases
- Integrating web scraping capabilities directly into AI-powered IDEs and desktop applications
- Providing AI models with real-time web content for analysis or summarization
- Automating data extraction from web pages for various applications