010 GitHub stars
02Recursive bulk crawling with configurable depth and parallel page loading
03Intelligent content cleaning that removes navigation, headers, footers, and duplicates
04AI-powered metadata enrichment for intelligent descriptions and keywords
05Pure markdown output with metadata stored in separate JSON files for portability
06Two-phase architecture separating raw data acquisition from markdown refinement