012 GitHub stars
02AI-powered content summarization with custom natural language queries
03Granular control over extraction limits for text, links, and image metadata
04Mandatory live crawling to ensure access to real-time, non-cached web content
05Structured data extraction using custom JSON Schema definitions
06Token-optimized output formats including 'toon' and JSON for efficient processing