All posts
Featured

Contextual Retrieval - Anthropic's Approach

Chunks loose context when split from original document

3 min read4 views0 comments
Contextual Retrieval - Anthropic's Approach

Cunking came into picture to address large contexts or large piece of information which is being fed to LLMs, due to this API rate limit used to hit. As you create chunks of original message, chunks loose context when split from original document.

Example :

Original Document : Company Financial Report Q3 2026

MARKDOWN
{ 1. Revenue Increased 15% YOY to $2.5B 2. Cost of Goods sold remains stable at 40% of revenue 3. Operating Expenses decreased by 5% due to the efficiency improvements }

Traditional Chunking -

  1. Chunk 1 - Revenue increased 15% year on year to $2.5B
  2. Chunk 2 - Cost of goods sold remained stable at 40% of revenue
  3. Chunk 3 - Operating expenses decreased 5% due to efficiency improvements

Problems with traditional chunking

Query - 'What were the Q3 operating expenses?'

Chunk 3 doesn't mention Q3 2024, it might not be retrieved or might be ambiguous.

Anthropic Solution : Add Context to each chunk

JavaScript
<document> {whole document} </document> Here is the chunk we want to situate within the whole document : <chunk> { chunk content } </chunk> Please give a short succint context (50-100 tokens) to situate this chunk within the overall document for the retreival purpose Answer only with the succint context and nothing else Generated contextual chunk : ORIGINAL CHUNK - "Operating expenses decreased 5% due to efficiency improvements" ENHANCED CHUNK - " This chunk is from Company financial report Q3 2024, operating expenses decreased 5% due to efficiency improvements"

Now the retrieval works

QUERY - "What were the Q3 2024 operating expenses ?"

  1. Traditional chunk : might miss or rank low
  2. Contextual chunk : high relevance ( contains "Q3 2024" and "Operating expenses")

COST BENEFIT ANALYSIS:

  1. Document : consists of 10,000 tokens which are translated from the words contained in the document
  2. Chunks : 40 chunks of 250 tokens each

CONTEXT GENERATION

  1. Input : 10,000 (document + 250 (chunk) = 10,250 tokens per chunk
  2. Output : 75 tokens context per chunk
  3. Total Input : 40 x 10,250 = 4,10,000 tokens
  4. Total Output : 40 x 75 - 3,000 tokens

At $0.01/1K inputs , $0.03/1K output :

  1. Input cost : 410 x $0.01 = $ 4.10
  2. Output cost : 3 x $0.03 = $0.09

Total $4.19 per document ( one time indexing cost)

Benefits : Improved retrieval accuracy will have 20-25% accuracy gains

Before: 67-70% context precision and After : 85-90% accuracy gains

Improvement is around 30-40% reduction in retrieval errors

SWIFT
Real world example : Customer support of a company - 10,000 queries coming per day - 30% fewer wrong contexts -> 3,000 fewer escalations - cost per escalation for example is $5 - Daily savings of $15000 - Monthly savings : $450,000 ROI : $4.19 INDEXING COST vs $450K SAVINGS translates in a 108,000% ROI

*NOTE : all the numbers are for reference purpose, once you try to implement these changes in your applications, the ROI and returns might vary but they will surely bring you well needed optimizations in terms of customer satisfaction, performance and cost savings in cost AI affairs *