01Comprehensive audio and video analysis with timestamped transcriptions
02High-fidelity document extraction from PDFs into structured JSON or Markdown
03Support for massive context windows up to 2M tokens for long-form media
040 GitHub stars
05Text-to-image generation and editing with controllable aspect ratios and styles
06Advanced vision capabilities including object detection and pixel-level segmentation