01Heading-aware chunking that preserves context and table integrity.
02Adaptive fetcher routing for diverse web sources (Wikipedia, YouTube, Playwright fallback).
031 GitHub stars
04VLM-driven page profiling to optimize content extraction on repeated site visits.
05Provides an MCP server for `fetch_page` and `profile_page` tools.
06Query-aware dense retrieval using bge-m3 embeddings and cross-encoder reranking.