0181 GitHub stars
02High-resolution image generation and pixel-level segmentation for visual editing.
03Native PDF vision processing for accurate table, form, and chart extraction.
04Advanced video analysis with scene detection, temporal Q&A, and YouTube URL support.
05Support for Gemini 2.5/2.0 models with massive 2M token context windows.
06Comprehensive audio processing including transcription, speaker ID, and text-to-speech.