Dynamically routes AI tasks like chat, summarization, and sentiment analysis to appropriate models using intelligent, LLM-powered routing within a lightweight, production-ready FastAPI backend.
Modern AI/LLM deployments often face challenges managing diverse model backends, complex routing decisions, and combining outputs from multiple models, all while handling production-grade concerns such as concurrency, streaming, and retries. This project offers a production-ready proof-of-concept for an MCP (Model Compute Paradigm) architecture. It provides a FastAPI-based microserver that orchestrates multiple AI/LLM models behind a unified, scalable interface, enabling dynamic task routing, LLM-based intent parsing, multi-model pipelines, and streaming chat capabilities, all built for asynchronous and Dockerized deployment.