Chunking Strategies
Chunking splits parsed document text into smaller pieces before embedding. The chunk size directly impacts retrieval quality: too large and the embedding captures too many topics; too small and the answer might be split across chunk boundaries. Power RAG uses a sliding window with overlap to balance precision and context continuity.
Why Chunking?
Embedding models have a maximum input length (typically 512–8192 tokens). A 50-page PDF cannot be embedded as a single unit — it must be split. Beyond the hard limit, smaller chunks produce better retrieval because:
- Each chunk covers a narrower topic, making its embedding more specific
- The similarity search returns chunks that are precisely about the query, not a page that happens to mention it once
- The retrieved context is denser — more signal per character when injected into the LLM prompt
Size vs Context Trade-offs
| Chunk size | Retrieval precision | Answer completeness | Risk |
|---|---|---|---|
| Very small (~50 words) | High | Low — answer may span multiple chunks | Fragmented answers; missing context |
| Medium (~256–512 words) | Good | Good | Balanced — recommended range |
| Large (~1000+ words) | Low | High | Embedding dilution; less focused retrieval |
Sliding Window with Overlap
SlidingWindowChunkingStrategy.java
View source ↗
@Component
public class SlidingWindowChunkingStrategy implements ChunkingStrategy {
// 512 words per chunk, 64 word overlap
public List<Chunk> chunk(List<ParsedSection> sections) {
for (ParsedSection section : sections) {
String[] words = section.getText().split("\\s+");
int step = Math.max(1, chunkSize - chunkOverlap); // = 448
for (int start = 0; start < words.length; start += step) {
int end = Math.min(start + chunkSize, words.length);
String text = String.join(" ",
Arrays.copyOfRange(words, start, end));
// store with metadata: chunk_index, start_line, ...
}
}
}
}
Overlap Visualized
With chunkSize=512 and chunkOverlap=64, the step is 448 words. Each consecutive chunk shares 64 words with the previous one:
Word positions in document:
Chunk 1: [word 0 ... word 511]
Chunk 2: [word 448 ... word 959] ← 64-word overlap with Chunk 1
Chunk 3: [word 896 ... word 1407] ← 64-word overlap with Chunk 2
Chunk 4: [word 1344 ... word 1855] ← 64-word overlap with Chunk 3
Overlap zone (chunk 1 / chunk 2):
|-- chunk 1 exclusive --|-- overlap --|-- chunk 2 exclusive --|
word 0 word 448 word 511 word 959
The overlap ensures that sentences near chunk boundaries appear in at least two chunks. A question whose answer straddles the boundary will match one of the two chunks containing it.
Configuration
application.yml — chunking config
View source ↗
powerrag:
ingestion:
chunk-size: 512 # words per chunk
chunk-overlap: 64 # words of overlap between consecutive chunks
The 512-word / 64-word overlap configuration works well as a starting point. If your documents are highly dense (legal contracts, technical specs), consider reducing chunk size to 256. If they are conversational (transcripts, emails), 512–768 works better.