Dynamic Model Routing
The Power RAG frontend lets users choose their LLM provider and model at runtime. The backend must resolve the correct
ChatClient bean and — for providers like Ollama and Gemini that share a single underlying bean — inject the correct model identifier per request via ChatOptions.
The Challenge
Spring beans are created at startup. You cannot create a new ChatClient bean for every possible Ollama model. The solution is two-level routing:
- Resolve the client bean — select the correct pre-built
ChatClientfor the provider - Override the model per-request — inject the specific model ID via
ChatOptionson the request spec
resolveClient()
RagService.java — resolveClient()
View source ↗
private ChatClient resolveClient(String provider, String modelId) {
if ("OLLAMA".equalsIgnoreCase(provider)) return ollamaBaseClient;
if ("GEMINI".equalsIgnoreCase(provider)) return geminiBaseClient;
ChatClient c = clientsByKey.get(provider.toUpperCase() + ":" + modelId);
if (c != null) return c;
return clientsByKey.get("ANTHROPIC:claude-sonnet-4-6"); // default
}
The logic:
- Ollama and Gemini share one bean each — the specific model is set later via options
- Anthropic clients are registered in a
Map<String, ChatClient>keyed as"ANTHROPIC:model-id" - If no match is found, fall back to the default Claude Sonnet bean
Dynamic Model Override with ChatOptions
Once the right client is selected, the specific model ID is injected via provider-specific ChatOptions. This overrides the model that was configured in application.yml for that particular request only — the next request starts fresh.
RagService.java — per-request model options
View source ↗
if ("OLLAMA".equalsIgnoreCase(provider) && modelId != null) {
baseSpec = baseSpec.options(
OllamaChatOptions.builder().model(modelId).build());
} else if ("GEMINI".equalsIgnoreCase(provider) && modelId != null) {
baseSpec = baseSpec.options(
GoogleGenAiChatOptions.builder().model(modelId).build());
}
Gemini and Ollama share one underlying model bean each. The specific model is injected per-request via
options(), keeping the bean count manageable. This avoids the need to pre-register a separate bean for every possible Ollama model (there could be dozens installed locally).
The Full Routing Flow
Request: provider="OLLAMA", modelId="qwen2.5-coder:32b"
│
▼
resolveClient("OLLAMA", "qwen2.5-coder:32b")
│
│ returns ollamaBaseClient (geminiFlash bean)
▼
baseSpec = ollamaBaseClient.prompt()
│
│ override model for this request
▼
baseSpec = baseSpec.options(
OllamaChatOptions.builder().model("qwen2.5-coder:32b").build())
│
▼
baseSpec.user(userMessage).call().content()
│
▼
Ollama API → qwen2.5-coder:32b → answer
Default Fallback
If the frontend sends an unknown provider/model combination, resolveClient() returns the Anthropic Claude Sonnet bean. This ensures the pipeline never breaks — it degrades gracefully to the primary LLM.
When adding a new Anthropic model, register it in
clientsByKey with the key "ANTHROPIC:new-model-id". For Ollama and Gemini, no code change is needed — the model ID is passed at request time.