Model Context Protocol
Classical RAG is powerful but static — it only knows what is in your vector store. Model Context Protocol (MCP) breaks that ceiling by giving the LLM a set of typed, callable tools that it can invoke at inference time. Instead of just retrieving pre-indexed text, the model can fetch a live webpage, look up a Jira ticket, search GitHub, or query production logs, then fold those results directly into its answer.
Power RAG adds MCP as an optional, feature-flagged layer on top of the standard RAG pipeline. When enabled, every chat request can potentially call real external services — but only when the LLM (or a fast router call) decides they are needed.
What is MCP?
MCP is an open protocol published by Anthropic in late 2024. It standardises the way applications expose tools, resources, and prompts to LLMs, regardless of which model or SDK is being used. Think of it as a USB standard for AI capabilities: you write a tool once (as an MCP server) and any MCP-aware client — Spring AI, LangChain, Claude Desktop, or your own code — can discover and call it.
ToolCallback objects, and attaches them to the ChatClient. The LLM
then decides which tools to invoke, passes arguments, and receives structured results.
Transport: STDIO vs HTTP/SSE
MCP supports two transport modes:
| Transport | How it works | Best for |
|---|---|---|
stdio |
Spring AI spawns the MCP server as a child process; communicates over stdin/stdout JSON-RPC | Local development, Python/Node scripts, compiled binaries |
http/sse |
MCP server runs as a separate HTTP service; client connects over Server-Sent Events | Production, containerised deployments, shared tool servers |
Power RAG uses stdio in development — the backend process spawns the Python MCP server
on startup. In a production container environment you would switch to the HTTP/SSE transport so the
MCP server runs as a separate sidecar with its own lifecycle.
Adding the Spring AI MCP Client
Spring AI 1.1.2 ships MCP client support as a dedicated starter. Add it alongside the existing AI
starters in pom.xml:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>
The starter brings in SyncMcpToolCallbackProvider — a Spring bean that connects to all
configured MCP servers on startup and exposes their tools as an array of ToolCallback
objects.
Configuration
MCP is disabled by default in application.yml so that production
environments do not require MCP servers to be running. The dev profile activates it:
spring:
ai:
mcp:
client:
enabled: false # off in prod; enabled via application-dev.yml
toolcallback:
enabled: false
powerrag:
mcp:
rag-enabled: false # second gate: attach tools to ChatClient calls
spring:
ai:
mcp:
client:
enabled: true
type: SYNC
request-timeout: 120s # Python cold-start can take a few seconds
stdio:
connections:
powerrag-tools:
command: python3
args:
- mcp/powerrag_mcp_tools.py
boutquin-email:
command: mcp/bin/mcp-server-email
env:
EMAIL_CONFIG_FILE: mcp/email-accounts.json
toolcallback:
enabled: true
powerrag:
mcp:
rag-enabled: true
Notice the two-level gate: spring.ai.mcp.client.enabled controls whether the Spring
AI MCP client bean is even instantiated; powerrag.mcp.rag-enabled controls whether
those tools are attached to each individual chat call. Both must be true for tools to
fire.
How RagService Attaches MCP Tools
SyncMcpToolCallbackProvider is injected as an Optional — absent when the
Spring AI bean is not configured, present when it is. RagService resolves the tool
array just before the main LLM call:
private final Optional<SyncMcpToolCallbackProvider> mcpToolCallbackProvider;
// At call time, after intent routing decides attachMcpTools=true:
ToolCallback[] mcpTools = attachMcpTools ? resolveMcpToolCallbacks() : null;
if (mcpTools != null && mcpTools.length > 0) {
promptSpec = promptSpec.toolCallbacks(mcpTools);
}
/** Returns wrapped MCP tool callbacks, or null when MCP is off or unavailable. */
private ToolCallback[] resolveMcpToolCallbacks() {
if (!mcpRagEnabled || mcpToolCallbackProvider.isEmpty()) {
return null;
}
ToolCallback[] raw = mcpToolCallbackProvider.get().getToolCallbacks();
if (raw == null || raw.length == 0) {
return null;
}
return ObservingToolCallback.wrapAll(raw, mcpInvocationRecorder);
}
Each tool returned by SyncMcpToolCallbackProvider is wrapped in
ObservingToolCallback before being handed to ChatClient. That wrapper
records timing and outcome without changing the tool's behaviour. See
Topic 31 for the full observability story.
Tool Discovery Endpoint
The frontend can query which MCP tools are currently active via a dedicated REST endpoint. This
powers the McpToolsPanel sidebar that shows users which live-data capabilities are
available in their session.
@GetMapping("/mcp-tools")
public ResponseEntity<McpToolsResponse> getMcpTools() {
boolean clientAvailable = mcpToolCallbackProvider.isPresent();
List<McpToolsResponse.McpToolEntry> tools = clientAvailable
? Arrays.stream(mcpToolCallbackProvider.get().getToolCallbacks())
.map(t -> new McpToolsResponse.McpToolEntry(
t.getToolDefinition().name(),
t.getToolDefinition().description()))
.toList()
: List.of();
return ResponseEntity.ok(new McpToolsResponse(mcpRagEnabled, clientAvailable, tools));
}
public record McpToolsResponse(
boolean ragMcpEnabled, // powerrag.mcp.rag-enabled flag
boolean mcpClientAvailable, // SyncMcpToolCallbackProvider bean present
List<McpToolEntry> tools // name + description for each registered tool
) {
public record McpToolEntry(String name, String description) {}
}
Example response when MCP is active with two servers connected:
{
"ragMcpEnabled": true,
"mcpClientAvailable": true,
"tools": [
{ "name": "fetch_url", "description": "Fetch an https (or http) URL..." },
{ "name": "get_current_time", "description": "Return current date and time..." },
{ "name": "get_weather", "description": "Current weather for a location..." },
{ "name": "jira_search_issues", "description": "Search Jira Cloud with JQL..." },
{ "name": "jira_get_issue", "description": "Get a single Jira issue..." },
{ "name": "github_search_code", "description": "Search GitHub code..." },
{ "name": "gcp_logging_query", "description": "Query Google Cloud Logging..." },
{ "name": "email_list", "description": "List emails in a mailbox folder..." }
]
}
The Full Flow with MCP
Here is how MCP fits into the complete RAG pipeline. Steps 2 and 5 are the new additions:
intent.retrieveDocuments() is true, Qdrant + PostgreSQL FTS runs as normal.
Skipped for purely general questions or live-data requests.
attachMcpTools is true and the global flag is on, tool callbacks are attached to
ChatClient. The LLM may invoke zero or more tools in a multi-turn tool-use loop before
producing its final answer. Each invocation is recorded by ObservingToolCallback.
interactions.mcp_invocations column.
Key Design Decisions
| Decision | Rationale |
|---|---|
| MCP off by default | Avoids requiring Python/binary dependencies in production containers unless explicitly opted in |
Optional<SyncMcpToolCallbackProvider> |
The bean is absent when spring.ai.mcp.client.enabled=false, so the code never NPEs even without the dev profile |
| Two-level gate | Separates "can the client connect?" from "should this call use tools?" — useful for A/B testing or per-user feature flags in the future |
| Sync (blocking) transport | Spring AI 1.1.2's async MCP support was incomplete at time of implementation; SYNC is simpler and correct for the thread-per-request model |
| Cache hits skip MCP | A cached answer already has a good response; re-running expensive tool calls would waste latency and API quota for no benefit |
./mvnw spring-boot:run
-Dspring-boot.run.profiles=dev to activate the dev profile and start MCP servers
automatically. You do not need to start the Python server manually.
request-timeout: 120s in the dev profile is intentionally
generous to allow for Python cold starts. In production with a pre-warmed HTTP/SSE server you should
lower this to 10–20 seconds to prevent tool call stalls from blocking the HTTP request.