ChatClient Basics

Module 2 · ~8 min read
ChatClient is the main Spring AI abstraction for sending prompts to an LLM. It uses a fluent builder API that separates prompt construction from execution, making the call chain easy to read and test.

Three Ways to Call

Spring AI's ChatClient supports three response modes:

Call styleReturnsWhen to use
.call().content() String Simple text response — the most common case
.call().chatResponse() ChatResponse When you need metadata: token counts, finish reason, model name
.stream().content() Flux<String> Server-sent events / streaming to the frontend as chunks arrive

Basic Usage

Basic ChatClient call
String answer = chatClient.prompt()
    .user("What is RAG?")
    .call()
    .content();

The fluent chain is:

  1. .prompt() — starts a new request spec
  2. .user("...") — sets the user message
  3. .call() — executes the request (blocking)
  4. .content() — extracts the text from the response

System Prompt with defaultSystem()

A system prompt provides persistent instructions that apply to every call made through a ChatClient instance. Set it once at bean creation with defaultSystem():

Setting a default system prompt at build time
ChatClient client = ChatClient.builder(model)
    .defaultSystem("You are a helpful assistant that always cites sources.")
    .build();

You can also override the system prompt per-request using .system("...") in the fluent chain. This is useful when the same ChatClient bean serves different use cases with different instructions.

How Power RAG Uses ChatClient

In RagService, the actual LLM call happens in the callLlm() helper method. By the time this method is called, the prompt has already been fully assembled (context + question + instructions) by MultilingualPromptBuilder.

RagService.java — callLlm() View source ↗
String rawAnswer = baseSpec.user(userMessage).call().content();

Here baseSpec is a ChatClient.ChatClientRequestSpec that has already been configured with the correct provider, model options, and any images — see Topic 06 for how baseSpec is built dynamically.

Prefer .call().content() for RAG responses — you want the complete answer before returning it to the cache and audit log. Use .stream() only when building a real-time chat UI that needs to display tokens as they arrive.

ChatClient vs ChatModel

Spring AI has two layers: ChatModel (the low-level provider wrapper) and ChatClient (the high-level fluent API). You should always use ChatClient in application code: