Baba Is Eval icon

Baba Is Eval

Evaluates large language models' meta-level reasoning within the interactive puzzle game "Baba Is You".

소개

Baba Is Eval provides an innovative platform for assessing the meta-level reasoning capabilities of large language models in a dynamic, interactive setting. Leveraging the unique rule-manipulation mechanics of the puzzle game "Baba Is You," it allows language models to interact with the game by manipulating word blocks and forming rules through text commands. The project includes an MCP server, exposing a suite of tools for models to receive game states, execute actions, and control game flow, effectively turning the game into a challenging and engaging testbed for advanced AI evaluation.

주요 기능

  • Integration with MCP (Meta-Level Control Protocol) servers
  • Ability to undo multiple previous actions
  • Execution of chained in-game movement commands
  • 0 GitHub stars
  • Programmatic level entry and exit
  • Dynamic retrieval of game state as a matrix

사용 사례

  • Evaluating large language models' meta-level reasoning
  • Benchmarking AI performance in interactive puzzle-solving
  • Developing and testing AI agents for complex rule-manipulation environments