01Native PDF vision processing for multi-page document extraction and table analysis.
02Pixel-level image segmentation and object detection using Gemini 2.0/2.5 models.
03Comprehensive audio transcription and analysis for files up to 9.5 hours.
04Advanced video understanding with scene detection and temporal Q&A support.
05High-fidelity text-to-image generation, editing, and multi-image composition.
0681 GitHub stars