01High-resolution image understanding including OCR and object detection
020 GitHub stars
03Long-form audio transcription and speech analysis for up to 9.5 hours
04Native PDF support for multi-page document extraction and analysis
05Comprehensive video processing for scenes, summaries, and timestamps
06Granular resolution control (low/medium/high) for token optimization