Enables asking any question about image, audio, or video files, returning AI-powered answers via state-of-the-art multimodal models.
Perception is a lightweight Model Context Protocol (MCP) server designed to extend applications with advanced media analysis capabilities. It leverages cutting-edge multimodal AI models served through fal.ai, allowing users to effortlessly query and receive detailed, context-aware answers about the content within images, audio, and video files.