Tool Observability & Audit
When an LLM calls external tools during a conversation, you need to know exactly which tools fired, how long each took, and whether they succeeded. Without this, a slow or failing tool call is an invisible black box — you cannot tell from the final answer alone why a response took five seconds or contained inaccurate information.
Power RAG instruments every MCP tool invocation transparently, without modifying the MCP servers or the Spring AI tool pipeline. The data flows from the wrapper class through a thread-local buffer, into the API response, and finally into the PostgreSQL audit log — giving you full traceability at every layer.
The Data Model: McpToolInvocationSummary
One record is created per tool call, per chat turn:
@JsonInclude(JsonInclude.Include.NON_NULL)
public record McpToolInvocationSummary(
String serverId, // inferred from tool name prefix, e.g. "powerrag-tools"
String toolName, // exact tool name, e.g. "jira_search_issues"
boolean success, // true if the call returned without exception
long durationMs, // client-side wall clock time for the call
String errorMessage, // null on success; truncated to 200 chars on failure
String argsSummary // tool input summary, max 200 chars; null if blank
) {}
@JsonInclude(NON_NULL) keeps the JSON compact — errorMessage and
argsSummary are omitted when null, which is the common case for successful calls with
simple inputs.
Layer 1 — ObservingToolCallback
This is the decorator that wraps every ToolCallback from the MCP provider. It
intercepts each call, times it, and posts a summary to the recorder. The model never knows it is
being observed.
private String invoke(Supplier<String> supplier, String toolInput) {
String toolName = delegate.getToolDefinition().name();
String serverId = inferServerId(toolName); // "powerrag-tools__jira_search" → "powerrag-tools"
long t0 = System.currentTimeMillis();
try {
String out = normalizeMcpToolOutput(supplier.get());
long ms = System.currentTimeMillis() - t0;
recorder.record(new McpToolInvocationSummary(
serverId, toolName, true, ms, null, summarizeArgs(toolInput)));
return out;
} catch (RuntimeException e) {
long ms = System.currentTimeMillis() - t0;
String msg = e.getMessage();
if (msg != null && msg.length() > 200) msg = msg.substring(0, 200) + "…";
recorder.record(new McpToolInvocationSummary(
serverId, toolName, false, ms, msg, summarizeArgs(toolInput)));
throw e; // re-throw so Spring AI can handle tool failures
}
}
Notice that the exception is re-thrown after recording. The observer records the failure but does not swallow it — Spring AI still gets to decide whether to retry or surface the error to the model.
Server ID is inferred from the tool name prefix, using a double-underscore separator that Spring AI introduces when namespacing tools from multiple MCP servers:
static String inferServerId(String toolName) {
if (toolName == null) return "mcp";
int sep = toolName.indexOf("__");
if (sep > 0) return toolName.substring(0, sep); // "powerrag-tools__fetch_url" → "powerrag-tools"
return "mcp"; // no namespace prefix: generic fallback
}
Layer 2 — McpInvocationRecorder
The recorder collects invocations in a ThreadLocal list. Because Spring MVC handles
each HTTP request on a single thread, every tool call made during one chat turn naturally lands in
the same list, without any synchronisation required.
@Component
public class McpInvocationRecorder {
private final ThreadLocal<List<McpToolInvocationSummary>> current =
ThreadLocal.withInitial(ArrayList::new);
public void clear() { current.get().clear(); }
public void record(McpToolInvocationSummary summary) { current.get().add(summary); }
/** Returns an immutable snapshot and clears the buffer for this thread. */
public List<McpToolInvocationSummary> snapshotAndClear() {
List<McpToolInvocationSummary> list = new ArrayList<>(current.get());
current.get().clear();
return list.isEmpty() ? List.of() : Collections.unmodifiableList(list);
}
}
snapshotAndClear() both returns the data and resets the
buffer in one atomic step. RagService always calls clear() before the LLM
call and snapshotAndClear() after, so there is no state leaking between requests even
in error paths — the recorder is always clean for the next request on the same thread.
Layer 3 — Database Audit Log
The Flyway V8 migration adds a nullable JSONB column to the interactions table:
-- MCP tool invocation summaries for a chat turn (nullable when no tools used)
ALTER TABLE interactions ADD COLUMN mcp_invocations jsonb NULL;
The column is NULL for interactions where no tools fired, keeping the table compact
for the common case. The JPA entity maps the column using Hibernate's JSONB type:
@JdbcTypeCode(SqlTypes.JSON)
@Column(columnDefinition = "jsonb")
private List<Map<String, Object>> mcpInvocations; // null when no tools were used
Stored JSON for a two-tool call looks like this:
[
{
"serverId": "powerrag-tools",
"toolName": "jira_search_issues",
"success": true,
"durationMs": 843,
"argsSummary": "{\"jql\": \"project = KAN ORDER BY created DESC\", \"max_results\": 5}"
},
{
"serverId": "powerrag-tools",
"toolName": "jira_get_issue",
"success": true,
"durationMs": 312
}
]
Layer 4 — API Response
RagResponse carries the invocation list alongside the answer and sources, so the
frontend can display tool activity without a separate API call:
public record RagResponse(
String answer,
double confidence,
List<SourceRef> sources,
String modelId,
long durationMs,
UUID interactionId,
boolean cacheHit,
String error,
String generatedImageBase64,
List<McpToolInvocationSummary> mcpInvocations // empty list when no tools fired
) {
public boolean mcpToolsUsed() {
return !mcpInvocations.isEmpty();
}
}
The frontend TypeScript interface mirrors this:
export interface McpToolInvocationSummary {
serverId: string
toolName: string
success: boolean
durationMs: number
errorMessage?: string
argsSummary?: string
}
export interface ChatQueryResponse {
answer: string
confidence: number
sources: SourceRef[]
modelId: string
durationMs: number
interactionId: string
cacheHit: boolean
error?: string
generatedImageBase64?: string
mcpInvocations?: McpToolInvocationSummary[] // undefined when not present in response
}
Layer 5 — Frontend Display
The chat window renders an expandable badge for each message that used tools. The badge uses amber styling to distinguish it from the cyan cache-hit chip and the green confidence indicator:
{(msg.response.mcpInvocations?.length ?? 0) > 0 && (
<details data-testid="mcp-tools-badge">
<summary className="text-xs text-amber-400 border border-amber-700/60 px-2 py-0.5 rounded-full">
MCP · {msg.response.mcpInvocations!.length} {plural}
</summary>
<ul className="mt-2 text-xs text-slate-500 space-y-1">
{msg.response.mcpInvocations!.map(inv => (
<li key={inv.toolName}>
<span className="text-slate-400">{inv.toolName}</span>
<span className="text-slate-600"> · {inv.durationMs}ms</span>
{!inv.success && inv.errorMessage && (
<span className="text-amber-600/90"> — {inv.errorMessage}</span>
)}
</li>
))}
</ul>
</details>
)}
The HTML <details>/<summary> element provides the
expand/collapse behaviour with no JavaScript required — it is a native browser control.
The McpToolsPanel Sidebar
A separate panel component shows which tools are available in the current session. It calls
GET /api/chat/mcp-tools on load and displays the result:
// Shows status: "Not configured" / "Not attached" / list of tool names
// Jira hint: renders a link to the Jira board if jira_* tools are present
function McpToolsPanel({ data }: { data: McpToolsCapabilitiesResponse | undefined }) {
const hasJira = mcpHasJiraTools(data)
return (
<div data-testid="mcp-tools-panel"
className="border border-amber-700/40 rounded bg-amber-950/20 p-3">
{/* Tool list or status message */}
{hasJira && (
<a href={JIRA_BOARD_URL} className="text-xs text-amber-500">
Open Jira board ↗
</a>
)}
</div>
)
}
Confidence Score Adjustment
The confidence scorer takes MCP tool invocations into account. Successful tool calls provide live, authoritative data that supplements or replaces KB retrieval — this can increase the model's effective confidence. Failed tool calls (where the model had to answer without the data it requested) can lower it:
// In responseConfidence():
double confidence = scorer.responseConfidence(
retrievalConfidence, hasRelevantDocs, mcpInvocations);
// mcpInvocations is passed in from the recorded invocation list.
// If tools ran and succeeded, the base confidence is boosted.
// If tools ran but all failed, the confidence is reduced.
Test Coverage
The observability layer has dedicated unit tests that verify output normalisation, argument summarisation, and failure recording without needing a live MCP server:
@Test
void normalizesTextContentWrapper() {
String wrapped = "TextContent[annotations=null, text={\"key\":\"value\"}, meta=null]";
String result = ObservingToolCallback.normalizeMcpToolOutput(wrapped);
assertEquals("{\"key\":\"value\"}", result);
}
@Test
void passesPlainJsonThrough() {
String json = "{\"ok\":true,\"text\":\"hello\"}";
assertEquals(json, ObservingToolCallback.normalizeMcpToolOutput(json));
}
@Test
void recordsFailureAndRethrows() {
ToolCallback failing = mock(ToolCallback.class);
when(failing.call(any())).thenThrow(new RuntimeException("timeout"));
ObservingToolCallback obs = new ObservingToolCallback(failing, recorder);
assertThrows(RuntimeException.class, () -> obs.call("{}"));
List<McpToolInvocationSummary> recorded = recorder.snapshotAndClear();
assertEquals(1, recorded.size());
assertFalse(recorded.get(0).success());
assertEquals("timeout", recorded.get(0).errorMessage());
}
Frontend test IDs provide stable anchors for E2E assertions:
| Test ID | Component | What it selects |
|---|---|---|
mcp-tools-badge |
ChatWindow |
The expandable MCP invocation summary on each message |
mcp-tools-panel |
McpToolsPanel |
The sidebar panel listing available tools |
mcp-tools-toggle |
McpToolsPanel |
The show/hide toggle button inside the panel |
SELECT mcp_invocations FROM interactions WHERE mcp_invocations IS NOT NULL AND mcp_invocations @> '[{"success":false}]';
This finds all interactions where at least one tool call failed.