Contextual Retrieval - Anthropic's Approach

Cunking came into picture to address large contexts or large piece of information which is being fed to LLMs, due to this API rate limit used to hit. As you create chunks of original message, chunks loose context when split from original document.

Example :

Original Document : Company Financial Report Q3 2026

MARKDOWN
{
1. Revenue Increased 15% YOY to $2.5B 
2. Cost of Goods sold remains stable at 40% of revenue 
3. Operating Expenses decreased by 5% due to the efficiency improvements
}

Traditional Chunking -

Chunk 1 - Revenue increased 15% year on year to $2.5B
Chunk 2 - Cost of goods sold remained stable at 40% of revenue
Chunk 3 - Operating expenses decreased 5% due to efficiency improvements

Problems with traditional chunking

Query - 'What were the Q3 operating expenses?'

Chunk 3 doesn't mention Q3 2024, it might not be retrieved or might be ambiguous.

Anthropic Solution : Add Context to each chunk

JavaScript
<document>
    {whole document}
</document>
Here is the chunk we want to situate within the whole document : 
<chunk>
    { chunk content }
</chunk>
Please give a short succint context (50-100 tokens) to situate this chunk within the overall document for the retreival purpose
Answer only with the succint context and nothing else

Generated contextual chunk :

ORIGINAL CHUNK - "Operating expenses decreased 5% due to efficiency improvements"
ENHANCED CHUNK - " This chunk is from Company financial report Q3 2024, operating expenses decreased 5% due to efficiency improvements"

Now the retrieval works

QUERY - "What were the Q3 2024 operating expenses ?"

Traditional chunk : might miss or rank low
Contextual chunk : high relevance ( contains "Q3 2024" and "Operating expenses")

COST BENEFIT ANALYSIS:

Document : consists of 10,000 tokens which are translated from the words contained in the document
Chunks : 40 chunks of 250 tokens each

CONTEXT GENERATION

Input : 10,000 (document + 250 (chunk) = 10,250 tokens per chunk
Output : 75 tokens context per chunk
Total Input : 40 x 10,250 = 4,10,000 tokens
Total Output : 40 x 75 - 3,000 tokens

At $0.01/1K inputs , $0.03/1K output :

Input cost : 410 x $0.01 = $ 4.10
Output cost : 3 x $0.03 = $0.09

Total $4.19 per document ( one time indexing cost)

Benefits : Improved retrieval accuracy will have 20-25% accuracy gains

Before: 67-70% context precision and After : 85-90% accuracy gains

Improvement is around 30-40% reduction in retrieval errors

SWIFT
Real world example :  Customer support of a company
- 10,000 queries coming per day
- 30% fewer wrong contexts -> 3,000 fewer escalations 
- cost per escalation for example is $5 
- Daily savings of $15000
- Monthly savings : $450,000

ROI : $4.19 INDEXING COST vs $450K SAVINGS translates in a 108,000% ROI

*NOTE : all the numbers are for reference purpose, once you try to implement these changes in your applications, the ROI and returns might vary but they will surely bring you well needed optimizations in terms of customer satisfaction, performance and cost savings in cost AI affairs *

Contextual Retrieval - Anthropic's Approach

Be the first to comment

Leave a comment

What metrics should product manager or engineer should focus on, for their RAG performance

How Query rewriting worked for us in improving accuracy and bringing cost savings in followup token consumption in queries

Be the first to comment

Leave a comment

What metrics should product manager or engineer should focus on, for their RAG performance

How Query rewriting worked for us in improving accuracy and bringing cost savings in followup token consumption in queries

Be the first to comment

Leave a comment

More to read

What metrics should product manager or engineer should focus on, for their RAG performance

How Query rewriting worked for us in improving accuracy and bringing cost savings in followup token consumption in queries

Be the first to comment

Leave a comment

More to read

What metrics should product manager or engineer should focus on, for their RAG performance

How Query rewriting worked for us in improving accuracy and bringing cost savings in followup token consumption in queries