Cleans noisy web pages, extracting only relevant text for large language model consumption.
WebShift is a powerful Rust library and MCP server designed to transform chaotic web pages into pristine, context-ready text for LLMs. It meticulously strips away extraneous elements like scripts, ads, navigation menus, and cookie banners, preventing LLM context windows from being flooded with irrelevant data. By sterilizing text, enforcing strict size budgets, and offering advanced features like search integration with BM25 reranking and optional LLM-driven query expansion and summarization, WebShift ensures that models receive only the critical information they need for effective reasoning and response generation.
