01Image Captioning: Generate short, normal, or detailed captions for images.
02Visual Question Answering: Ask natural language questions about image content.
03Object Detection & Visual Pointing: Detect and locate specific objects, including precise coordinates.
04URL Support & Batch Processing: Analyze images from local files and remote URLs, with efficient batch operations.
05Device Optimization: Automatic detection and optimization for CPU, CUDA, and MPS (Apple Silicon).
060 GitHub stars