Provides a mixture-of-experts visual question-answering server, solving complex visual grounding and image question-answering tasks.
Viper is a sophisticated mixture-of-experts (MoE) visual question-answering (VQA) server, designed to tackle challenging problems in visual grounding, compositional image question answering, and external knowledge-dependent image question answering. Built upon the FastMCP framework, it operates as a streamable-HTTP server, ensuring compatibility with all FastMCP client tooling for seamless integration and deployment. It leverages a diverse set of state-of-the-art models to deliver comprehensive visual intelligence and offers flexible installation options for both local development and containerized environments.