010 GitHub stars
02Typed Schema.org parsing for Place, Article, Event, and Organization data
03HTTP page fetching with status detection and stealth capabilities
04Article text extraction using Trafilatura algorithm
05Extraction of OG:Image and date metadata
06Fact extraction via JSON-LD, Microdata, and regex cascade