Context Assembly

Module 4 · ~8 min read

Context assembly is the bridge between retrieval and generation. It takes the ranked list of chunks from RRF and formats them into a structured text block that the LLM can read, reference, and cite. The format must be clear enough for the model to identify source boundaries and produce accurate [SOURCE N] citations.

The [SOURCE N] Format

Each chunk is rendered as a labelled block. The label [SOURCE N] is what the LLM includes in its answer when citing that passage:

[SOURCE 1] report.pdf § Executive Summary The Q3 revenue increased by 15% year-over-year, driven primarily by expansion in the Asia-Pacific region... [SOURCE 2] policy.docx § Section 3 — Compliance Training All employees must complete annual compliance training by December 31. Failure to comply may result in suspension of system access... [SOURCE 3] data.xlsx § Sheet1 Row 42 Product: Widget A, Q3 Sales: 12,450 units, Revenue: $1.24M

ContextAssembler.assemble()

ContextAssembler.java — assemble() View source ↗

for (int i = 0; i < chunks.size(); i++) {
    String ref = buildRef(c);  // "filename § section"
    String entry = String.format("[SOURCE %d] %s%n%s%n%n", i+1, ref, c.text());

    if (totalChars + entry.length() > maxContextChars) break; // 24,000 char cap
    sb.append(entry);
    totalChars += entry.length();
}

The 24,000 Character Cap

LLMs have finite context windows. Claude Sonnet's context window is large (~200k tokens), but injecting all available text would:

Increase cost proportionally to token count
Increase latency
Risk diluting the most relevant passages with lower-quality ones

24,000 characters ≈ 6,000 tokens, which provides substantial context while leaving ample room for the question and response within any reasonable context window.

The cap is applied greedily: chunks are added in RRF-score order until the cap is hit. The most relevant chunks (highest RRF score) are always included.

Source Citation Extraction

After the LLM produces its answer, the same chunks are also used to build a structured List<SourceRef> — returned to the frontend alongside the answer text for display as a citations panel.

ContextAssembler.java — extractSources() View source ↗

public List<SourceRef> extractSources(List<RetrievedChunk> chunks) {
    return chunks.stream()
        .map(c -> new SourceRef(
            str(c.metadata().get("file_name")),
            buildRef(c),
            snippet(c.text()),  // first 200 chars
            c.metadata().get("page_number"),
            toInt(c.metadata().get("row_number") != null
                ? c.metadata().get("row_number")
                : c.metadata().get("start_line")),
            str(c.metadata().get("document_id"))))
        .toList();
}

Each SourceRef carries enough information for the frontend to render a clickable citation that links back to the original document (by document_id) and the specific page or row.

The 200-character snippet in each SourceRef is displayed in the citation panel as a preview. Keep it short enough to scan quickly but long enough to confirm relevance.

← Previous Next →