Model Context Protocol

Module 10 · ~20 min read

Classical RAG is powerful but static — it only knows what is in your vector store. Model Context Protocol (MCP) breaks that ceiling by giving the LLM a set of typed, callable tools that it can invoke at inference time. Instead of just retrieving pre-indexed text, the model can fetch a live webpage, look up a Jira ticket, search GitHub, or query production logs, then fold those results directly into its answer.

Power RAG adds MCP as an optional, feature-flagged layer on top of the standard RAG pipeline. When enabled, every chat request can potentially call real external services — but only when the LLM (or a fast router call) decides they are needed.

What is MCP?

MCP is an open protocol published by Anthropic in late 2024. It standardises the way applications expose tools, resources, and prompts to LLMs, regardless of which model or SDK is being used. Think of it as a USB standard for AI capabilities: you write a tool once (as an MCP server) and any MCP-aware client — Spring AI, LangChain, Claude Desktop, or your own code — can discover and call it.

Key concept: An MCP server advertises a list of callable tools with typed schemas. An MCP client (Spring AI in our case) discovers those tools at startup, wraps them into ToolCallback objects, and attaches them to the ChatClient. The LLM then decides which tools to invoke, passes arguments, and receives structured results.

Transport: STDIO vs HTTP/SSE

MCP supports two transport modes:

TransportHow it worksBest for
stdio Spring AI spawns the MCP server as a child process; communicates over stdin/stdout JSON-RPC Local development, Python/Node scripts, compiled binaries
http/sse MCP server runs as a separate HTTP service; client connects over Server-Sent Events Production, containerised deployments, shared tool servers

Power RAG uses stdio in development — the backend process spawns the Python MCP server on startup. In a production container environment you would switch to the HTTP/SSE transport so the MCP server runs as a separate sidecar with its own lifecycle.

Adding the Spring AI MCP Client

Spring AI 1.1.2 ships MCP client support as a dedicated starter. Add it alongside the existing AI starters in pom.xml:

backend/pom.xml — MCP client starter View source ↗
<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>

The starter brings in SyncMcpToolCallbackProvider — a Spring bean that connects to all configured MCP servers on startup and exposes their tools as an array of ToolCallback objects.

Configuration

MCP is disabled by default in application.yml so that production environments do not require MCP servers to be running. The dev profile activates it:

backend/src/main/resources/application.yml — MCP off by default View source ↗
spring:
  ai:
    mcp:
      client:
        enabled: false       # off in prod; enabled via application-dev.yml
        toolcallback:
          enabled: false

powerrag:
  mcp:
    rag-enabled: false       # second gate: attach tools to ChatClient calls
backend/src/main/resources/application-dev.yml — STDIO connections View source ↗
spring:
  ai:
    mcp:
      client:
        enabled: true
        type: SYNC
        request-timeout: 120s   # Python cold-start can take a few seconds
        stdio:
          connections:
            powerrag-tools:
              command: python3
              args:
                - mcp/powerrag_mcp_tools.py
            boutquin-email:
              command: mcp/bin/mcp-server-email
              env:
                EMAIL_CONFIG_FILE: mcp/email-accounts.json
        toolcallback:
          enabled: true

powerrag:
  mcp:
    rag-enabled: true

Notice the two-level gate: spring.ai.mcp.client.enabled controls whether the Spring AI MCP client bean is even instantiated; powerrag.mcp.rag-enabled controls whether those tools are attached to each individual chat call. Both must be true for tools to fire.

How RagService Attaches MCP Tools

SyncMcpToolCallbackProvider is injected as an Optional — absent when the Spring AI bean is not configured, present when it is. RagService resolves the tool array just before the main LLM call:

backend/src/main/java/com/powerrag/rag/service/RagService.java — tool attachment View source ↗
private final Optional<SyncMcpToolCallbackProvider> mcpToolCallbackProvider;

// At call time, after intent routing decides attachMcpTools=true:
ToolCallback[] mcpTools = attachMcpTools ? resolveMcpToolCallbacks() : null;

if (mcpTools != null && mcpTools.length > 0) {
    promptSpec = promptSpec.toolCallbacks(mcpTools);
}

/** Returns wrapped MCP tool callbacks, or null when MCP is off or unavailable. */
private ToolCallback[] resolveMcpToolCallbacks() {
    if (!mcpRagEnabled || mcpToolCallbackProvider.isEmpty()) {
        return null;
    }
    ToolCallback[] raw = mcpToolCallbackProvider.get().getToolCallbacks();
    if (raw == null || raw.length == 0) {
        return null;
    }
    return ObservingToolCallback.wrapAll(raw, mcpInvocationRecorder);
}

Each tool returned by SyncMcpToolCallbackProvider is wrapped in ObservingToolCallback before being handed to ChatClient. That wrapper records timing and outcome without changing the tool's behaviour. See Topic 31 for the full observability story.

Tool Discovery Endpoint

The frontend can query which MCP tools are currently active via a dedicated REST endpoint. This powers the McpToolsPanel sidebar that shows users which live-data capabilities are available in their session.

backend/src/main/java/com/powerrag/api/ChatController.java — GET /api/chat/mcp-tools View source ↗
@GetMapping("/mcp-tools")
public ResponseEntity<McpToolsResponse> getMcpTools() {
    boolean clientAvailable = mcpToolCallbackProvider.isPresent();
    List<McpToolsResponse.McpToolEntry> tools = clientAvailable
        ? Arrays.stream(mcpToolCallbackProvider.get().getToolCallbacks())
                .map(t -> new McpToolsResponse.McpToolEntry(
                        t.getToolDefinition().name(),
                        t.getToolDefinition().description()))
                .toList()
        : List.of();
    return ResponseEntity.ok(new McpToolsResponse(mcpRagEnabled, clientAvailable, tools));
}
backend/src/main/java/com/powerrag/api/McpToolsResponse.java — response DTO View source ↗
public record McpToolsResponse(
        boolean ragMcpEnabled,       // powerrag.mcp.rag-enabled flag
        boolean mcpClientAvailable,  // SyncMcpToolCallbackProvider bean present
        List<McpToolEntry> tools     // name + description for each registered tool
) {
    public record McpToolEntry(String name, String description) {}
}

Example response when MCP is active with two servers connected:

GET /api/chat/mcp-tools — example response
{
  "ragMcpEnabled": true,
  "mcpClientAvailable": true,
  "tools": [
    { "name": "fetch_url",            "description": "Fetch an https (or http) URL..." },
    { "name": "get_current_time",     "description": "Return current date and time..." },
    { "name": "get_weather",          "description": "Current weather for a location..." },
    { "name": "jira_search_issues",   "description": "Search Jira Cloud with JQL..." },
    { "name": "jira_get_issue",       "description": "Get a single Jira issue..." },
    { "name": "github_search_code",   "description": "Search GitHub code..." },
    { "name": "gcp_logging_query",    "description": "Query Google Cloud Logging..." },
    { "name": "email_list",           "description": "List emails in a mailbox folder..." }
  ]
}

The Full Flow with MCP

Here is how MCP fits into the complete RAG pipeline. Steps 2 and 5 are the new additions:

Step 1 — Input guardrail The user message is checked by Gemini 2.5 Flash. Harmful prompts are blocked before any retrieval or tool call.
Step 2 — Intent routing (new) A fast LLM call (or heuristic fallback) decides: should the knowledge base be searched? Should MCP tools be attached? See Topic 30 for details.
Step 3 — Hybrid retrieval (conditional) If intent.retrieveDocuments() is true, Qdrant + PostgreSQL FTS runs as normal. Skipped for purely general questions or live-data requests.
Step 4 — Context assembly Retrieved chunks are assembled into the system prompt context, up to 24 000 chars.
Step 5 — LLM call with MCP tools (new) If attachMcpTools is true and the global flag is on, tool callbacks are attached to ChatClient. The LLM may invoke zero or more tools in a multi-turn tool-use loop before producing its final answer. Each invocation is recorded by ObservingToolCallback.
Step 6 — Output guardrail PII detection and redaction on the model's final answer.
Step 7 — Semantic cache store + audit log The answer, sources, and MCP invocation summaries are persisted to PostgreSQL. Tool invocations are stored as JSONB in the interactions.mcp_invocations column.

Key Design Decisions

DecisionRationale
MCP off by default Avoids requiring Python/binary dependencies in production containers unless explicitly opted in
Optional<SyncMcpToolCallbackProvider> The bean is absent when spring.ai.mcp.client.enabled=false, so the code never NPEs even without the dev profile
Two-level gate Separates "can the client connect?" from "should this call use tools?" — useful for A/B testing or per-user feature flags in the future
Sync (blocking) transport Spring AI 1.1.2's async MCP support was incomplete at time of implementation; SYNC is simpler and correct for the thread-per-request model
Cache hits skip MCP A cached answer already has a good response; re-running expensive tool calls would waste latency and API quota for no benefit
Tip: During local development, run ./mvnw spring-boot:run -Dspring-boot.run.profiles=dev to activate the dev profile and start MCP servers automatically. You do not need to start the Python server manually.
Warning: The request-timeout: 120s in the dev profile is intentionally generous to allow for Python cold starts. In production with a pre-warmed HTTP/SSE server you should lower this to 10–20 seconds to prevent tool call stalls from blocking the HTTP request.