Automates interactions with the MS Paint application for large language models, enabling them to control and input text into the app via agentic workflows.
This project addresses the challenge of connecting applications like MS Paint, which lack a direct API, to an MCP (Multi-Modal Control Plane) Server. It achieves this by 'hacking' the MS Paint application through an agentic AI system. The solution allows a Large Language Model (LLM) to automatically open MS Paint, perform tasks such as drawing shapes, and input text in response to user queries, all without manual commands. It demonstrates how to establish LLM control over GUI applications, evolving through different levels of prompt engineering and architectural sophistication, from basic automation to a modular design incorporating cognitive layers like Perception, Memory, Decision-Making, and Action using Pydantic.