Agent Cascade FAQs

Question 1

What is Agent Cascade?

Accepted Answer

Agent Cascade is an MCP (Model Context Protocol) server that acts as a bridge, routing requests from MCP clients (like Windsurf/Cascade) directly to your locally hosted large language models (LLMs), such as those served by Ollama or LM Studio.

Question 2

Which local LLM platforms does Agent Cascade support?

Accepted Answer

Agent Cascade is designed to work with any local LLM endpoint that provides an OpenAI-compatible `/chat/completions` API, specifically mentioning Ollama and LM Studio as compatible servers.

Question 3

How do I make chat completion requests using Agent Cascade?

Accepted Answer

You can make chat completion requests using the `local_chat` MCP tool. For clients that support invoking custom methods, `local.chat` is also available as a direct request handler, both forwarding prompts to your local LLM.

Question 4

Can Agent Cascade be used for self-reflection or same-model sub-calls?

Accepted Answer

Yes, Agent Cascade can facilitate self-reflection or same-model sub-calls. It proxies these requests to your local LLM. For safety and predictability, it's recommended to control budgets (`max_tokens`, `timeout_ms`) and recursion depth within your orchestrator or agent logic.

Question 5

How do I configure Agent Cascade with my local LLM endpoint?

Accepted Answer

You configure Agent Cascade using environment variables. `LM_BASE_URL` specifies the base URL for your local LLM API (e.g., `http://localhost:11434/v1`), and `DEFAULT_MODEL` sets the model used when none is specified in a call. These are typically set in your MCP client's server configuration.

Agent Cascade

소개

주요 기능

사용 사례