01High-fidelity image generation and editing with support for multiple aspect ratios.
02Advanced object detection and pixel-level segmentation using Gemini 2.5+ models.
030 GitHub stars
04Comprehensive audio transcription and analysis for files up to 9.5 hours.
05Temporal video understanding and scene detection with YouTube URL support.
06Native PDF vision processing for structured data extraction from forms and tables.