RAG Made Ridiculously Simple: How AI Looks Up Answers

Imagine your are taking an open book exam versus a close booked exam, In a closed book exam you can only use what you have memorized. Gets very tough if you didn't prepare well. Now if I allow the exam to be open book, you can look up information in the text book when you need it, you are now powered with all the knowledge available in the book.

RAG is like giving AI models an open book exam.

Instead of relying only on what it learned during training its memory , it can search through externam documents to find relevant information before answering a question.

Example : Customer service representative

Without RAG - They answer based only on their training and memory
With RAG - They can search the company knowledge base, product manuals and recent updates before responding

CORE PROBLEM RAG SOLVES ?

AI models had a challenge of knowledge cutoff dates, GPT 4 trained in 2023 doesn't know about the events in 2024. Also the models can't know your company's internal documents, your personal files or property information.

Mathematical Context ? - A model's context window has a limit (128k tokens in GPT4), you can't fit entire company database, all company documents, product catalog, realtime information in the small context window.

RAG architecture is a two step process

Retrieval - Finding Relevant information, example : before answering a question, the system searches through documents to find the most relevant pieces of information.
Generation - Creating the answer --> The model receives both your question and the retrieved information, then generates an answer based on the combined context.

RAG Mathematics :

SQL
Relevance scoring - When a retriever searches a document, it assigns a relevance score to each document or chunk.
Term Based retrieval math (TF-IDF)
Term Frequency (TF)
TF(term, document) = (Number of times term appears in doc) / (total number of terms in the document)

search term for example is "laptop" and in the document total terms are 10
TF = 2/ 10 = 0.2

Inverse Document frequency ( IDF)
IDF(term) = log(total documents / Documents containg term)

Example : Total documents are 10,000 and document with search term "Laptop" are 500
Then IDF('laptop') = log ( 10,000 / 500 ) => 1.301

TF-IDF score = TF x IDF = 0.2 x 1.301 => 0.260

Why This matters ?  - Common words (the , is, a ) have low IDF which is low importance because there frequency is high in a document. This helps to filter out and make the search efficient and optimized. 
Rare specific words have high IDF which is high importance

This is how a simple RAG works, in other blog topics, I will cover various search algorithms used and try to explain in simple language

RAG Made Ridiculously Simple: How AI Looks Up Answers

Be the first to comment

Leave a comment

How does a product manager go about taking RAG architecture decision

What metrics should product manager or engineer should focus on, for their RAG performance

Contextual Retrieval - Anthropic's Approach

Be the first to comment

Leave a comment

How does a product manager go about taking RAG architecture decision

What metrics should product manager or engineer should focus on, for their RAG performance

Contextual Retrieval - Anthropic's Approach

Be the first to comment

Leave a comment

More to read

How does a product manager go about taking RAG architecture decision

What metrics should product manager or engineer should focus on, for their RAG performance

Contextual Retrieval - Anthropic's Approach

Be the first to comment

Leave a comment

More to read

How does a product manager go about taking RAG architecture decision

What metrics should product manager or engineer should focus on, for their RAG performance

Contextual Retrieval - Anthropic's Approach