ChatClient Basics
Three Ways to Call
Spring AI's ChatClient supports three response modes:
| Call style | Returns | When to use |
|---|---|---|
.call().content() |
String |
Simple text response — the most common case |
.call().chatResponse() |
ChatResponse |
When you need metadata: token counts, finish reason, model name |
.stream().content() |
Flux<String> |
Server-sent events / streaming to the frontend as chunks arrive |
Basic Usage
String answer = chatClient.prompt()
.user("What is RAG?")
.call()
.content();
The fluent chain is:
.prompt()— starts a new request spec.user("...")— sets the user message.call()— executes the request (blocking).content()— extracts the text from the response
System Prompt with defaultSystem()
A system prompt provides persistent instructions that apply to every call made through a ChatClient instance. Set it once at bean creation with defaultSystem():
ChatClient client = ChatClient.builder(model)
.defaultSystem("You are a helpful assistant that always cites sources.")
.build();
You can also override the system prompt per-request using .system("...") in the fluent chain. This is useful when the same ChatClient bean serves different use cases with different instructions.
How Power RAG Uses ChatClient
In RagService, the actual LLM call happens in the callLlm() helper method. By the time this method is called, the prompt has already been fully assembled (context + question + instructions) by MultilingualPromptBuilder.
String rawAnswer = baseSpec.user(userMessage).call().content();
Here baseSpec is a ChatClient.ChatClientRequestSpec that has already been configured with the correct provider, model options, and any images — see Topic 06 for how baseSpec is built dynamically.
.call().content() for RAG responses — you want the complete answer before returning it to the cache and audit log. Use .stream() only when building a real-time chat UI that needs to display tokens as they arrive.ChatClient vs ChatModel
Spring AI has two layers: ChatModel (the low-level provider wrapper) and ChatClient (the high-level fluent API). You should always use ChatClient in application code:
ChatModel— takesPromptobjects, returnsChatResponse. Lower-level, more verbose.ChatClient— fluent builder wrappingChatModel. SupportsdefaultSystem(),defaultOptions(), advisors, and structured output. Use this.