Semantic Caching
The SemanticCache Interface
public interface SemanticCache {
Optional<CacheHit> lookup(String query, String language);
void store(String query, String language, String answer,
double confidence, List<SourceRef> sources, String modelId);
}
How Redis Vector Search Works
Power RAG uses Redis Stack 7.x with the RedisVectorStore from Spring AI. The lookup process:
EmbeddingModel (gemini-embedding-001, 768 dimensions) converts the query string into a float vector.powerrag:cache:{lang} Redis index for the nearest neighbour vector using cosine similarity.CacheHit. Otherwise → miss, return Optional.empty().Threshold Choice: 0.92
The 0.92 threshold is deliberately high. Consider these examples:
- "Who is the CEO?" and "Who is the Chief Executive Officer?" → cosine similarity ~0.97 → HIT (same meaning)
- "Who is the CEO?" and "What year was the company founded?" → cosine similarity ~0.61 → MISS (different topic)
- "What is our leave policy?" and "How many days of annual leave do I get?" → ~0.94 → HIT
A threshold below 0.90 would risk serving a cached answer about a subtly different question, potentially misleading users.
Language Scoping
The Redis index is scoped by language: powerrag:cache:en, powerrag:cache:fr, etc. An English query will never hit a French cache entry, even if they are semantically equivalent — the answers are in different languages.
TTL: 24 Hours
Cached answers expire after 24 hours. This ensures stale answers (from outdated documents) do not persist indefinitely. If you update a document and re-ingest it, old cached answers about that document will naturally expire within a day.
NoOpSemanticCacheService for Tests
@Profile("test")
@Component
public class NoOpSemanticCacheService implements SemanticCache {
@Override public Optional<CacheHit> lookup(String q, String l) { return Optional.empty(); }
@Override public void store(...) { /* no-op */ }
}
The @Profile("test") annotation activates this bean only when running with the test Spring profile. Unit and integration tests don't need Redis — the no-op implementation always returns a cache miss and discards stores silently.