Trafilatura icon

Trafilatura

Extracts main content and metadata from web pages using the Trafilatura library via a Model Context Protocol interface.

소개

This server acts as a Model Context Protocol (MCP) interface to the powerful Trafilatura library, enabling developers and models to programmatically extract the main textual content and various metadata from web pages. It exposes a simple `fetch_and_extract` tool for efficient asynchronous operations, making it suitable for integration with MCP-compatible clients like IDEs or coding agents.

주요 기능

  • Web scraping for main text content
  • Metadata extraction (title, author, date, etc.)
  • Configurable content inclusion (comments, tables)
  • Single, easy-to-use `fetch_and_extract` tool
  • Asynchronous architecture for efficient I/O
  • 0 GitHub stars

사용 사례

  • Programmatically fetching and parsing news articles or blog posts
  • Integrating web content extraction into AI agents or models
  • Building data collection pipelines that require clean web page content
Craft Better Prompts with AnyPrompt
Sponsored