Context Assembly
[SOURCE N] citations.
The [SOURCE N] Format
Each chunk is rendered as a labelled block. The label [SOURCE N] is what the LLM includes in its answer when citing that passage:
ContextAssembler.assemble()
for (int i = 0; i < chunks.size(); i++) {
String ref = buildRef(c); // "filename § section"
String entry = String.format("[SOURCE %d] %s%n%s%n%n", i+1, ref, c.text());
if (totalChars + entry.length() > maxContextChars) break; // 24,000 char cap
sb.append(entry);
totalChars += entry.length();
}
The 24,000 Character Cap
LLMs have finite context windows. Claude Sonnet's context window is large (~200k tokens), but injecting all available text would:
- Increase cost proportionally to token count
- Increase latency
- Risk diluting the most relevant passages with lower-quality ones
24,000 characters ≈ 6,000 tokens, which provides substantial context while leaving ample room for the question and response within any reasonable context window.
The cap is applied greedily: chunks are added in RRF-score order until the cap is hit. The most relevant chunks (highest RRF score) are always included.
Source Citation Extraction
After the LLM produces its answer, the same chunks are also used to build a structured List<SourceRef> — returned to the frontend alongside the answer text for display as a citations panel.
public List<SourceRef> extractSources(List<RetrievedChunk> chunks) {
return chunks.stream()
.map(c -> new SourceRef(
str(c.metadata().get("file_name")),
buildRef(c),
snippet(c.text()), // first 200 chars
c.metadata().get("page_number"),
toInt(c.metadata().get("row_number") != null
? c.metadata().get("row_number")
: c.metadata().get("start_line")),
str(c.metadata().get("document_id"))))
.toList();
}
Each SourceRef carries enough information for the frontend to render a clickable citation that links back to the original document (by document_id) and the specific page or row.
SourceRef is displayed in the citation panel as a preview. Keep it short enough to scan quickly but long enough to confirm relevance.