01Generates alt text, dense captions, and structured JSON metadata for images.
02Leverages large vision models via OpenRouter or local backends.
03Supports processing both remote image URLs and local file paths.
04Provides a minimal, composable MCP server for various computer vision tasks.
05Highly configurable for model selection, backend preference, and metadata generation modes.
060 GitHub stars