关于
Puppeteer Vision scrapes webpages, extracts the main content using Mozilla's Readability, and converts it to well-formatted Markdown. It uses AI-powered interaction via vision models to automatically handle cookie consent banners, CAPTCHAs, paywalls, and other interactive elements that block content, ensuring comprehensive content extraction. It integrates seamlessly with MCP-compatible LLM orchestrators, allowing for automated web scraping and data collection workflows.
主要功能
- AI-powered interaction for bypassing website obstacles
- 4 GitHub stars
- Converts HTML to Markdown with custom formatting
- Uses Puppeteer with stealth mode for scraping
- Supports stdio and SSE communication modes
- Extracts main content with Mozilla's Readability
使用案例
- Automated web content extraction for LLM tools
- Collecting data from websites with complex interactive elements
- Bypassing paywalls and login walls programmatically