01Text-to-image generation and iterative editing with controllable aspect ratios
020 GitHub stars
03Advanced image understanding including OCR, object detection, and pixel-level segmentation
04Native PDF processing for up to 1,000 pages with structured table and form extraction
05High-fidelity audio transcription and analysis for files up to 9.5 hours
06Long-form video processing with scene detection and temporal Q&A capabilities