Trafilatura
0
Extracts main content and metadata from web pages using the Trafilatura library via a Model Context Protocol interface.
About
This server acts as a Model Context Protocol (MCP) interface to the powerful Trafilatura library, enabling developers and models to programmatically extract the main textual content and various metadata from web pages. It exposes a simple `fetch_and_extract` tool for efficient asynchronous operations, making it suitable for integration with MCP-compatible clients like IDEs or coding agents.
Key Features
- Web scraping for main text content
- Metadata extraction (title, author, date, etc.)
- Configurable content inclusion (comments, tables)
- Single, easy-to-use `fetch_and_extract` tool
- Asynchronous architecture for efficient I/O
- 0 GitHub stars
Use Cases
- Programmatically fetching and parsing news articles or blog posts
- Integrating web content extraction into AI agents or models
- Building data collection pipelines that require clean web page content