Keyword Search with PostgreSQL FTS

Module 4 · ~9 min read
Vector search is excellent for conceptual queries but struggles with exact terms like product codes, invoice numbers, or proper names. PostgreSQL's built-in Full Text Search (FTS) complements vector search by finding chunks that contain the exact words the user typed.

Why Keyword Search Alongside Vector Search?

Consider a query like "Find invoice INV-2024-001". The embedding of "INV-2024-001" will be generic — any document about invoices will score similarly. The exact string "INV-2024-001" must be matched literally. Keyword search handles this perfectly.

Similarly for:

PostgreSQL tsvector and GIN Index

The document_chunks table stores raw chunk text in a chunk_text column. The migration adds a generated tsvector column that PostgreSQL automatically maintains, and a GIN index for fast FTS queries.

V2__document_ingestion.sql — FTS column and index View source ↗
ALTER TABLE document_chunks
  ADD COLUMN chunk_text_tsv tsvector
    GENERATED ALWAYS AS (to_tsvector('english', chunk_text)) STORED;

CREATE INDEX idx_document_chunks_fts
  ON document_chunks USING GIN(chunk_text_tsv);

GENERATED ALWAYS AS (...) STORED means PostgreSQL recomputes the tsvector automatically whenever chunk_text changes — no application code needed to maintain it.

Spring Data FTS Query

DocumentChunkRepository.java — fullTextSearch() View source ↗
@Query(value = """
    SELECT * FROM document_chunks
    WHERE chunk_text_tsv @@ plainto_tsquery('english', :query)
    ORDER BY ts_rank(chunk_text_tsv, plainto_tsquery('english', :query)) DESC
    LIMIT :limit
    """, nativeQuery = true)
List<DocumentChunk> fullTextSearch(@Param("query") String query,
                                   @Param("limit") int limit);

Query Components

Connecting FTS Results to Qdrant

The FTS query returns DocumentChunk entities from PostgreSQL. Each entity carries a qdrantId field — the UUID that was used when the chunk was inserted into Qdrant. The RRF merge uses this ID to deduplicate when both retrieval sources return the same chunk.

plainto_tsquery is simpler and safer than to_tsquery — it handles arbitrary user input without requiring the user to know FTS query syntax. Use to_tsquery only if you need Boolean operators like AND/OR/NOT in the query.