Embeddings

Module 3 · ~7 min read

Two sentences about the same topic will have similar vectors even if they use completely different words — this is the power of semantic search. Embeddings translate meaning into geometry: similar meanings produce vectors that point in similar directions in a high-dimensional space.

What is an Embedding?

An embedding is a dense float array (a vector) that represents the semantic meaning of a piece of text. The embedding model reads the text and outputs a fixed-length array of floating-point numbers — 768 numbers in Power RAG's case.

Two text fragments with similar meaning will have vectors that are close in this 768-dimensional space, measured by cosine similarity. "Car costs" and "automobile pricing" are semantically similar — their vectors will be close even though they share no words.

gemini-embedding-001

Power RAG uses Google’s gemini-embedding-001 through Spring AI’s Google GenAI integration. The app requests 768-dimensional vectors so they align with the Qdrant collection (the API default is higher; dimensions are set explicitly in config). Ollama embedding autoconfiguration is excluded so the knowledge base and semantic cache always use the same cloud embedding model.

768 dimensions — must match spring.ai.vectorstore.qdrant.dimensions
Requires GOOGLE_API_KEY — same key as Gemini chat / Imagen in this project
Consistent vectors — ingestion, hybrid retrieval, and Redis semantic cache share one EmbeddingModel bean

The Embedding Pipeline

Chunk text "The Q3 revenue increased by 15% year-over-year" │ ▼ gemini-embedding-001 (Google GenAI) │ ▼ float[768] [0.023, -0.145, 0.891, 0.002, -0.334, ...] │ ▼ Stored in Qdrant vector database alongside chunk text and metadata ───────────────────────────────────────────── At query time: User question: "What was Q3 revenue growth?" │ ▼ gemini-embedding-001 │ ▼ float[768] ← query vector │ ▼ Cosine similarity search in Qdrant Returns top-K most similar chunk vectors

How Spring AI Wires It

The Google GenAI starter provides the EmbeddingModel bean. VectorStoreConfig passes it to QdrantVectorStore, which embeds chunks on ingest and embeds the query at search time. The semantic cache service uses the same bean for cache lookup and storage.

application.yml — embedding configuration (excerpt) View source ↗

spring:
  autoconfigure:
    exclude: org.springframework.ai.model.ollama.autoconfigure.OllamaEmbeddingAutoConfiguration
  ai:
    google:
      genai:
        api-key: ${GOOGLE_API_KEY:}
        embedding:
          api-key: ${GOOGLE_API_KEY:}
          text:
            options:
              model: gemini-embedding-001
              dimensions: 768
    vectorstore:
      qdrant:
        dimensions: 768  # must match embedding output

The dimensions value in the Qdrant config must exactly match the embedding output size. If you change models or dimensions, drop and recreate the Qdrant collection — existing vectors are not compatible.

Trade-offs: managed embeddings vs local

Using Google’s embedding API adds network latency and per-token cost, but keeps one consistent model for dev, CI, and production and avoids running a separate embedding stack. For strict air-gapped deployments you could swap in a local EmbeddingModel and recreate the collection; the Spring AI abstractions stay the same.

← Previous Next →