Enables natural language control of a humanoid robot through an integrated voice agent and large language model.

概要

This Python-based MCP server transforms the Baby Brewie robot into an intuitive, voice-controlled system. It integrates a dedicated voice agent for wake-word activation and speech recognition, sending commands to a large language model (LLM) via the Together API. The server also provides the LLM with context from pre-defined robot action groups, allowing for intelligent execution of complex tasks. Responses are voiced using gTTS and cached for reduced latency, while robust ROS communication ensures seamless robot interaction.

主な機能

  • Robust ROS Communication: Employs a rewritten WebSocket manager based on roslibPY for reliable ROS message passing using JSON.
  • Contextual Action Execution: Passes pre-defined robot action groups to the LLM, enabling intelligent, context-aware command execution without exact naming.
  • LLM Integration: Connects with large language models (e.g., Together AI) to interpret natural language commands and control robot actions.
  • Extensible Toolset: Provides additional MCP functions like image capture, precise step control, and named action execution.
  • Advanced Voice Control: Features wake-word activation (Porcupine), speech recognition (SpeechRecognition), LLM interaction, and voiced responses (gTTS) with caching.
  • 0 GitHub stars

ユースケース

  • Automating complex robot tasks through intelligent contextual interpretation of user requests.
  • Operating humanoid robots using natural spoken language commands.
  • Developing advanced human-robot interaction systems powered by generative AI.