Confidence Scoring

Module 4 · ~6 min read
A low confidence score means your knowledge base doesn't have relevant documents for the query. The LLM still answers — using its training data — but won't cite non-existent sources or produce hallucinated document references. Confidence scoring enables graceful fallback rather than a hard failure.

What Confidence Measures

Confidence is a proxy for retrieval relevance: how strongly do the top retrieved chunks relate to the user's question? Power RAG uses the RRF combined scores as the signal — high RRF scores mean both retrieval methods agreed the chunks are relevant.

ConfidenceScorer

ConfidenceScorer.java View source ↗
@Component
public class ConfidenceScorer {
    public double score(List<RetrievedChunk> chunks) {
        if (chunks == null || chunks.isEmpty()) return 0.0;
        double avg = chunks.stream()
            .mapToDouble(RetrievedChunk::score)
            .average().orElse(0.0);
        return Math.min(1.0, avg * 100); // normalise RRF micro-values to 0-1
    }
}

RRF scores are very small numbers (typically 0.01–0.03). Multiplying by 100 maps them to a human-readable 0–1 scale. The result is capped at 1.0 to prevent scores above that range.

The 0.1 Threshold Decision

RagService.java — confidence threshold check View source ↗
boolean hasRelevantDocs = confidence >= 0.1 && !chunks.isEmpty();
if (!hasRelevantDocs) {
    // Fall back to LLM general knowledge
    log.info("No relevant docs (confidence={}), using general knowledge", confidence);
}

When hasRelevantDocs is false:

Confidence Score Bands

Score rangeInterpretationPipeline action
0.0 – 0.09 No relevant documents found Use general LLM knowledge, no citations
0.10 – 0.40 Weak relevance — some related content Use context with citations, note uncertainty
0.40 – 0.70 Moderate relevance Use context with citations
0.70 – 1.00 High relevance — documents directly answer the question Use context with citations
The confidence score is also stored in the audit log and returned to the frontend. You can use it to display a visual confidence indicator in the UI — helping users understand when the answer comes from your documents vs. the LLM's general training.