Chunking Strategies

Module 3 · ~9 min read

Chunking splits parsed document text into smaller pieces before embedding. The chunk size directly impacts retrieval quality: too large and the embedding captures too many topics; too small and the answer might be split across chunk boundaries. Power RAG uses a sliding window with overlap to balance precision and context continuity.

Why Chunking?

Embedding models have a maximum input length (typically 512–8192 tokens). A 50-page PDF cannot be embedded as a single unit — it must be split. Beyond the hard limit, smaller chunks produce better retrieval because:

Each chunk covers a narrower topic, making its embedding more specific
The similarity search returns chunks that are precisely about the query, not a page that happens to mention it once
The retrieved context is denser — more signal per character when injected into the LLM prompt

Size vs Context Trade-offs

Chunk size	Retrieval precision	Answer completeness	Risk
Very small (~50 words)	High	Low — answer may span multiple chunks	Fragmented answers; missing context
Medium (~256–512 words)	Good	Good	Balanced — recommended range
Large (~1000+ words)	Low	High	Embedding dilution; less focused retrieval

Sliding Window with Overlap

SlidingWindowChunkingStrategy.java View source ↗

@Component
public class SlidingWindowChunkingStrategy implements ChunkingStrategy {

    // 512 words per chunk, 64 word overlap
    public List<Chunk> chunk(List<ParsedSection> sections) {
        for (ParsedSection section : sections) {
            String[] words = section.getText().split("\\s+");
            int step = Math.max(1, chunkSize - chunkOverlap); // = 448

            for (int start = 0; start < words.length; start += step) {
                int end = Math.min(start + chunkSize, words.length);
                String text = String.join(" ",
                    Arrays.copyOfRange(words, start, end));
                // store with metadata: chunk_index, start_line, ...
            }
        }
    }
}

Overlap Visualized

With chunkSize=512 and chunkOverlap=64, the step is 448 words. Each consecutive chunk shares 64 words with the previous one:

Word positions in document: Chunk 1: [word 0 ... word 511] Chunk 2: [word 448 ... word 959] ← 64-word overlap with Chunk 1 Chunk 3: [word 896 ... word 1407] ← 64-word overlap with Chunk 2 Chunk 4: [word 1344 ... word 1855] ← 64-word overlap with Chunk 3 Overlap zone (chunk 1 / chunk 2): |-- chunk 1 exclusive --|-- overlap --|-- chunk 2 exclusive --| word 0 word 448 word 511 word 959

The overlap ensures that sentences near chunk boundaries appear in at least two chunks. A question whose answer straddles the boundary will match one of the two chunks containing it.

Configuration

application.yml — chunking config View source ↗

powerrag:
  ingestion:
    chunk-size: 512    # words per chunk
    chunk-overlap: 64  # words of overlap between consecutive chunks

The 512-word / 64-word overlap configuration works well as a starting point. If your documents are highly dense (legal contracts, technical specs), consider reducing chunk size to 256. If they are conversational (transcripts, emails), 512–768 works better.

← Previous Next →