About
VLArm is designed to replicate Google DeepMind's Vision-Language-Action (VLA) model, transforming an existing robot, specifically HuggingFace's SO-101 arm, into an intelligent system capable of making decisions autonomously. By integrating a locally-running MCP server with a Large Language Model (LLM) from Ollama, this project enables the robotic arm to execute complex actions based on natural language commands. It lays the groundwork for advanced industrial applications where robots can interpret and respond to human instructions, with future plans to incorporate live vision and listening capabilities.