01Comprehensive audio transcription and speaker identification for files up to 9.5 hours.
020 GitHub stars
03Advanced video analysis including scene detection, temporal Q&A, and YouTube URL support.
04High-quality text-to-image generation and editing with precise control over aspect ratios and styles.
05Vision-based OCR, object detection, and pixel-level segmentation using Gemini 2.5+ models.
06Intelligent PDF extraction that converts tables, forms, and charts into structured JSON data.