关于
This server acts as a Model Context Protocol (MCP) interface to the powerful Trafilatura library, enabling developers and models to programmatically extract the main textual content and various metadata from web pages. It exposes a simple `fetch_and_extract` tool for efficient asynchronous operations, making it suitable for integration with MCP-compatible clients like IDEs or coding agents.
主要功能
- Web scraping for main text content
- Metadata extraction (title, author, date, etc.)
- Configurable content inclusion (comments, tables)
- Single, easy-to-use `fetch_and_extract` tool
- Asynchronous architecture for efficient I/O
- 0 GitHub stars
使用案例
- Programmatically fetching and parsing news articles or blog posts
- Integrating web content extraction into AI agents or models
- Building data collection pipelines that require clean web page content