01Visual understanding including object detection, pixel-level segmentation, and OCR
02Structured data extraction from complex PDF documents, tables, and charts
03Comprehensive audio transcription and analysis for files up to 9.5 hours
04Integrated text-to-image generation and refinement using Gemini models
05Advanced video processing with scene detection and YouTube URL support
061 GitHub stars