This tool acts as an MCP server, empowering AI assistants with the ability to digest web content by converting URLs and raw HTML into clean, structured Markdown. Built on the `urltomarkdown` library, it leverages technologies like Turndown for HTML-to-Markdown conversion, Mozilla's Readability for intelligent content extraction and noise removal (stripping navigation, ads, etc.), and JSDOM for robust DOM parsing. The resulting Markdown is optimized for summarization, analysis, data extraction, and ingestion into various AI workflows, including Retrieval-Augmented Generation (RAG) systems.
主要功能
01Convert URLs to clean Markdown with optional title prepending and link stripping
02Built on robust libraries including Turndown, Readability, and JSDOM
03Convert raw HTML to clean Markdown with optional source URL for relative links
04Utilizes Mozilla Readability for intelligent content extraction and noise reduction
05Supports stdio transport for seamless integration with MCP-compatible AI clients
060 GitHub stars
使用案例
01Extract structured content from web pages for data analysis and processing
02Feed clean web content into Retrieval-Augmented Generation (RAG) pipelines
03Enable AI assistants to read and summarize online articles, documentation, or blog posts