Embeddings
What is an Embedding?
An embedding is a dense float array (a vector) that represents the semantic meaning of a piece of text. The embedding model reads the text and outputs a fixed-length array of floating-point numbers — 768 numbers in Power RAG's case.
Two text fragments with similar meaning will have vectors that are close in this 768-dimensional space, measured by cosine similarity. "Car costs" and "automobile pricing" are semantically similar — their vectors will be close even though they share no words.
gemini-embedding-001
Power RAG uses Google’s gemini-embedding-001 through Spring AI’s Google GenAI integration. The app requests 768-dimensional vectors so they align with the Qdrant collection (the API default is higher; dimensions are set explicitly in config). Ollama embedding autoconfiguration is excluded so the knowledge base and semantic cache always use the same cloud embedding model.
- 768 dimensions — must match
spring.ai.vectorstore.qdrant.dimensions - Requires
GOOGLE_API_KEY— same key as Gemini chat / Imagen in this project - Consistent vectors — ingestion, hybrid retrieval, and Redis semantic cache share one
EmbeddingModelbean
The Embedding Pipeline
How Spring AI Wires It
The Google GenAI starter provides the EmbeddingModel bean. VectorStoreConfig passes it to QdrantVectorStore, which embeds chunks on ingest and embeds the query at search time. The semantic cache service uses the same bean for cache lookup and storage.
spring:
autoconfigure:
exclude: org.springframework.ai.model.ollama.autoconfigure.OllamaEmbeddingAutoConfiguration
ai:
google:
genai:
api-key: ${GOOGLE_API_KEY:}
embedding:
api-key: ${GOOGLE_API_KEY:}
text:
options:
model: gemini-embedding-001
dimensions: 768
vectorstore:
qdrant:
dimensions: 768 # must match embedding output
dimensions value in the Qdrant config must exactly match the embedding output size. If you change models or dimensions, drop and recreate the Qdrant collection — existing vectors are not compatible.Trade-offs: managed embeddings vs local
Using Google’s embedding API adds network latency and per-token cost, but keeps one consistent model for dev, CI, and production and avoids running a separate embedding stack. For strict air-gapped deployments you could swap in a local EmbeddingModel and recreate the collection; the Spring AI abstractions stay the same.