The Full RAG Pipeline

Module 5 · ~15 min read

This topic is a stage-by-stage walkthrough of RagService.query() — the entry point for every chat request. All 9 stages are covered in execution order.

RagService.java — full pipeline View source ↗

Stage 0: Input Guardrail

0
Input safety check via Gemini 2.5 Flash
GuardrailService.checkInput(question) sends the user's message to Google Gemini (gemini-2.5-flash by default). The model classifies the text as safe or unsafe; categories are parsed from the model reply.

BLOCK path: If the input is flagged, the service returns a rejection response and logs a record to the guardrail_flags table. The request never reaches retrieval or the LLM.

PASS path: The classification returns "safe" and execution continues.

Stage 1: Semantic Cache Lookup

1
Redis vector cache lookup
semanticCache.lookup(question, lang) embeds the question and searches the Redis vector index for a cached answer with cosine similarity ≥ 0.92.

HIT path: Return the cached answer, sources, and confidence immediately. A cache hit skips stages 1.5 through 8. Typical savings: 2–10 seconds of LLM latency.

MISS path: Continue to stage 1.5.

Stage 1.5: Image Generation Detection

1.5
Detect image generation intent
imageGenerationService.isImageGenerationRequest(question) checks for generation verb + image noun pairs ("generate a picture of...").

Detected: Calls Imagen 3 (or Gemini Flash fallback), returns the image as a base64 string in generatedImageBase64. Pipeline exits after image generation — no RAG retrieval needed.

Not detected: Continue to stage 2.

Stage 2: Hybrid Retrieval

2
Dense + keyword search → RRF merge
HybridRetriever.retrieve(question) runs two searches in parallel:
• Dense: vectorStore.similaritySearch(topK*2) via Qdrant
• Keyword: documentChunkRepository.fullTextSearch(topK*2) via PostgreSQL FTS

Results are merged with Reciprocal Rank Fusion (k=60) and capped at topK (default: 10).

Stage 3: Confidence Scoring

3
Assess retrieval relevance
ConfidenceScorer.score(chunks) averages the RRF scores of the top chunks and normalises to 0–1.

If confidence < 0.1 or chunks are empty: hasRelevantDocs = false. The prompt will tell the LLM to answer from general knowledge without citing sources.

Stage 4: Context Assembly

4
Format chunks as [SOURCE N] blocks
ContextAssembler.assemble(chunks) formats the retrieved chunks into a [SOURCE N] filename § section\ntext\n\n block, capped at 24,000 characters.

ContextAssembler.extractSources(chunks) builds the List<SourceRef> for the response.

Stage 5: LLM Call

5
Build prompt → call LLM → receive answer
MultilingualPromptBuilder.buildUserMessage() assembles the final user message: image instruction (if any) + context block + question + language instruction.

resolveClient(provider, modelId) selects the correct ChatClient bean. Model-specific options are injected if needed.

baseSpec.user(userMessage).call().content() executes the LLM call and returns the raw answer text.

Stage 6: Output Guardrail

6
PII detection and redaction
GuardrailService.checkOutput(answer) applies regex patterns for email addresses, SSNs, and credit card numbers.

If PII is detected: the PII is redacted from the answer and a flag is logged. The redacted answer is returned to the user.

Stage 7: Cache Store

7
Store answer in Redis semantic cache
semanticCache.store(question, lang, answer, confidence, sources, modelId) embeds the question and stores the answer with a 24-hour TTL in the Redis vector index.

Future queries with semantically similar questions (≥ 0.92 similarity) will hit this cache entry.

Stage 8: Audit Log

8
Persist interaction to PostgreSQL
interactionRepository.save(interaction) writes a record to the interactions table containing: user ID, question, answer, model used, confidence, source count, language, duration, and timestamp.

This data powers usage analytics, quality monitoring, and compliance auditing.

Timing Summary

After stage 8, the total elapsed time is logged at INFO level:

Log output — successful RAG query
INFO  RAG query completed in 3241ms — provider=ANTHROPIC model=claude-sonnet-4-6
      confidence=0.72 sources=5 lang=en