Server Fetch
CreatedMaartenSmeets
Fetches content from the internet using browser automation and multiple extraction methods.
About
Server Fetch is a Model Context Protocol server designed to equip Large Language Models with the ability to retrieve and process web content effectively, even from complex web pages. It utilizes browser automation, OCR, and various extraction techniques to overcome challenges such as JavaScript rendering and anti-scraping measures. A sophisticated scoring system ensures that the highest quality content is selected, making it a valuable tool for LLMs interacting with the web.
Key Features
- Multiple content extraction methods (HTML parsing, document parsing, markdown conversion)
- 3 GitHub stars
- Browser automation with undetected-chromedriver
- Automated handling of cookie consent banners
- OCR with layout detection using pytesseract
- Sophisticated scoring system for selecting the best content
Use Cases
- Extracting text from images within web pages using OCR
- Enabling LLMs to retrieve and process content from JavaScript-heavy websites
- Retrieving content from websites that employ anti-scraping techniques