MS Paint AgenticAI icon

MS Paint AgenticAI

Automates interactions with the MS Paint application for large language models, enabling them to control and input text into the app via agentic workflows.

关于

This project addresses the challenge of connecting applications like MS Paint, which lack a direct API, to an MCP (Multi-Modal Control Plane) Server. It achieves this by 'hacking' the MS Paint application through an agentic AI system. The solution allows a Large Language Model (LLM) to automatically open MS Paint, perform tasks such as drawing shapes, and input text in response to user queries, all without manual commands. It demonstrates how to establish LLM control over GUI applications, evolving through different levels of prompt engineering and architectural sophistication, from basic automation to a modular design incorporating cognitive layers like Perception, Memory, Decision-Making, and Action using Pydantic.

主要功能

  • Enables LLM control over GUI applications via agentic workflows
  • Implements modular cognitive layers (Perception, Memory, Decision-Making, Action) using Pydantic for agent architecture
  • Explores advanced prompt engineering for predictable LLM outputs and multi-turn interactions
  • Automates MS Paint interactions without direct API access
  • 0 GitHub stars
  • Integrates with Gemini API for LLM capabilities and uses Pywinauto for Windows GUI automation

使用案例

  • Automating tasks on Windows GUI applications that lack APIs
  • Enabling large language models to interact with and control desktop software
  • Prototyping LLM-driven productivity tools for legacy applications
  • Demonstrating advanced agentic AI capabilities for human-computer interaction