Evaluates large language models' meta-level reasoning within the interactive puzzle game "Baba Is You".
Baba Is Eval provides an innovative platform for assessing the meta-level reasoning capabilities of large language models in a dynamic, interactive setting. Leveraging the unique rule-manipulation mechanics of the puzzle game "Baba Is You," it allows language models to interact with the game by manipulating word blocks and forming rules through text commands. The project includes an MCP server, exposing a suite of tools for models to receive game states, execute actions, and control game flow, effectively turning the game into a challenging and engaging testbed for advanced AI evaluation.