AgentRobot
Createdrolenet
Empowers large language models with visual and auditory perception, enabling natural human-computer interactions.
About
AgentRobot is a multi-agent human-computer interaction system designed to enhance large language models by integrating visual recognition, speech recognition, and speech synthesis. The system's architecture comprises specialized agents, including a Brain Agent for decision-making, an Eye Agent for visual input processing, an Ear Agent for audio input processing, and a Mouth Agent for audio output. These agents collaborate to achieve a natural and intuitive human-computer interaction experience, giving large language models the ability to 'see,' 'hear,' and 'speak'.
Key Features
- Chinese speech recognition and synthesis
- Configurable agent parameters and model settings
- Web interface for visualization and control
- 2 GitHub stars
- Real-time face detection and recognition
- Multi-agent architecture for modularity
Use Cases
- Creating interactive virtual assistants
- Developing voice-controlled applications
- Building AI-powered robots with perception capabilities