Puppeteer Vision
Createddjannot
Scrapes webpages and converts them to markdown, leveraging AI to handle interactive elements automatically.
About
Puppeteer Vision scrapes webpages, extracts the main content using Mozilla's Readability, and converts it to well-formatted Markdown. It uses AI-powered interaction via vision models to automatically handle cookie consent banners, CAPTCHAs, paywalls, and other interactive elements that block content, ensuring comprehensive content extraction. It integrates seamlessly with MCP-compatible LLM orchestrators, allowing for automated web scraping and data collection workflows.
Key Features
- AI-powered interaction for bypassing website obstacles
- 4 GitHub stars
- Converts HTML to Markdown with custom formatting
- Uses Puppeteer with stealth mode for scraping
- Supports stdio and SSE communication modes
- Extracts main content with Mozilla's Readability
Use Cases
- Automated web content extraction for LLM tools
- Collecting data from websites with complex interactive elements
- Bypassing paywalls and login walls programmatically