Semantic Caching

Module 5 · ~10 min read
A traditional cache keys on exact string equality — "What is RAG?" and "Can you explain RAG?" would be two cache misses. A semantic cache keys on meaning: if two questions have vectors with cosine similarity ≥ 0.92, they are considered equivalent and the cached answer is returned. This dramatically improves hit rates for paraphrased questions.

The SemanticCache Interface

SemanticCache.java View source ↗
public interface SemanticCache {
    Optional<CacheHit> lookup(String query, String language);
    void store(String query, String language, String answer,
               double confidence, List<SourceRef> sources, String modelId);
}

How Redis Vector Search Works

Power RAG uses Redis Stack 7.x with the RedisVectorStore from Spring AI. The lookup process:

1
Embed the query
The shared EmbeddingModel (gemini-embedding-001, 768 dimensions) converts the query string into a float vector.
2
Search the Redis index
Search the powerrag:cache:{lang} Redis index for the nearest neighbour vector using cosine similarity.
3
Threshold check
If the nearest neighbour has cosine similarity ≥ 0.92 → return the cached answer as a CacheHit. Otherwise → miss, return Optional.empty().
4
Return or continue
On a HIT, the full pipeline (retrieval, LLM call, guardrails) is bypassed entirely. Typical latency: <50ms vs 2–10s for a full call.

Threshold Choice: 0.92

The 0.92 threshold is deliberately high. Consider these examples:

A threshold below 0.90 would risk serving a cached answer about a subtly different question, potentially misleading users.

Language Scoping

The Redis index is scoped by language: powerrag:cache:en, powerrag:cache:fr, etc. An English query will never hit a French cache entry, even if they are semantically equivalent — the answers are in different languages.

TTL: 24 Hours

Cached answers expire after 24 hours. This ensures stale answers (from outdated documents) do not persist indefinitely. If you update a document and re-ingest it, old cached answers about that document will naturally expire within a day.

NoOpSemanticCacheService for Tests

NoOpSemanticCacheService.java View source ↗
@Profile("test")
@Component
public class NoOpSemanticCacheService implements SemanticCache {
    @Override public Optional<CacheHit> lookup(String q, String l) { return Optional.empty(); }
    @Override public void store(...) { /* no-op */ }
}

The @Profile("test") annotation activates this bean only when running with the test Spring profile. Unit and integration tests don't need Redis — the no-op implementation always returns a cache miss and discards stores silently.