01Provides APIs for detecting all recognizable objects, specific objects by text prompt, and human pose keypoints.
023 GitHub stars
03Accurately obtains object count, position, and attributes from images.
04Enables fine-grained image understanding, including full-scene recognition and targeted detection.
05Integrates seamlessly with MCP Clients and other MCP Servers for multi-step visual workflows.
06Supports building natural language-driven visual agents for real-world automation.