Transforms single or multi-view images into rich 3D reconstructions, optimized for Apple Silicon with Metal Performance Shaders.
VGGT-MPS is an optimized implementation of Facebook Research's Visual Geometry Grounded Transformer model, specifically engineered for Apple Silicon (M1/M2/M3) Macs utilizing Metal Performance Shaders (MPS). This tool empowers users to generate detailed 3D scene reconstructions from various image inputs, including predicting depth maps, camera poses, and dense 3D point clouds. Key advancements like sparse attention enable memory-efficient, city-scale reconstructions, while a unified CLI and Gradio web interface provide flexible interaction. It also features integration with Claude Desktop via the Model Context Protocol, making it a powerful solution for advanced computer vision tasks on macOS.