This project runs a local Model Context Protocol (MCP) server, offering a suite of tools for generative AI asset creation directly on your machine. It enables text-to-image, text-to-audio/music, text-to-speech, and image/text-to-3D model generation. Leveraging advanced open-source models such as segmind/SSD-1B for fast image generation, stabilityai/stable-audio-open-1.0 for high-quality sound, Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice for robust multilingual speech, and stabilityai/TripoSR for rapid 3D reconstruction, it's designed to run efficiently on consumer NVIDIA GPUs, providing a powerful local solution for diverse AI-driven content creation.
Key Features
01Local end-to-end AI asset generation via MCP server
02Text-to-Image generation using fast SSD-1B model
03Image/Text-to-3D Model reconstruction using TripoSR
040 GitHub stars
05Text-to-Audio and Music generation with Stable Audio Open
06Multilingual Text-to-Speech generation via Qwen3-TTS
Use Cases
01Integrating local generative AI capabilities into MCP-compatible clients like Cursor or Claude Desktop
02Rapid prototyping of diverse digital assets including images, sound, speech, and 3D models
03Experimenting with state-of-the-art open-source generative AI models on local hardware