01Comprehensive audio transcription and speaker identification for files up to 9.5 hours
02Intelligent image analysis featuring object detection, OCR, and pixel-level segmentation
03Native PDF vision processing for extracting tables, forms, and charts into structured JSON
04Advanced video understanding including scene detection, temporal Q&A, and YouTube support
05High-quality text-to-image generation with support for multiple aspect ratios and iterative editing
060 GitHub stars