01High-fidelity image generation and editing with multiple aspect ratio support.
02Native PDF vision processing for extracting structured data from tables and forms.
03Object detection and pixel-level segmentation using Gemini 2.5 models.
04Advanced video understanding including scene detection and YouTube URL support.
050 GitHub stars
06Comprehensive audio transcription and analysis for files up to 9.5 hours.