How AI Assistants Remember: A Practical Memory Architecture Guide

I will take an example of Personal AI assistant, this is a good usecase because it needs to remember you preferences and your old coversations to build upon as a continous conversation experience.

User: "I'm planning a trip to Tawang"

Agent: "Great! I remember you prefer boutique hotels and vegetarian restaurants. When are you thinking of going?"

User: "How did you know my preferences?"

Behind the scenes is the Long-term memory (PostgreSQL):

VBNET
SELECT * FROM user_preferences WHERE user_id='user_123';

Results:
- accommodation_preference: 'boutique hotels'
- dietary_restrictions: 'vegetarian'
- travel_style: 'cultural immersion'
- budget_tier: 'mid-range'
- previous_destinations: ['Singapore', 'Dubai', 'Bentonville']

Short-term memory (current conversation): Picks up from previous conversations with the user

User: "I'm planning a trip to Bentonville"
Agent: "Great! I remember you prefer..."

Internal knowledge: This is what the LLM knows and uses from it's internal data training.

- Bentonville is a city in Arkansas
- American cuisine includes hamburgers, barbecue (brisket/ribs), fried chicken, apple pie
- Arkansas is known for diverse geography ranging from the Ozark Mountains to the Mississippi Delta

There are various memory management strategies which are deployed and below is a very simple example which I used to learn how it works

The Problem : Context window being small, doing long conversations sometimes make model hallucinate or provide inaccurate responses. Model context window can retain 128,000 tokens in total.

Lets say the conversation length I had is of 50 messages (roughly 30,000 tokens)
Context window is of 128,000 tokens
Remaining tokens are (128,000 - 30,000) = 98,000 tokens

Memory Allocation strategy for leftover 98,000 tokens :

System prompts can take for example 2,000 tokens
Recent messages (my past conversations) around lets say 10 messages : 8,000 tokens
Summarised history : 5,000 tokens
Retrieved relevant past exchanges : 10,000 tokens
User Profile/ Preferences : 3,000 tokens -Retrived documents (RAG) : 70,000 tokens

That's how the memory allocation works where allocation happens around

60-70% to RAG
2-3% to user preferences
10-12% on past relevant exchanges
5% on summarization
8-10% on recent messages
2-3% on System prompt

Summarization :

In order to have a continous experience and conversation, you might have seen applications like Claude, GPT or Cursor started summarizing conversations. The moment you see a message like "Summarizing conversation", it means that the model is now trying to preserve essential information to allow user to start the next token window while I hold this summarization to fetch from whenever needed.

Example :

VBNET
User: "Tell me what all different kind of assets does my office has ?"
Agent: " We have 1,000 office desks, 1,500 Office Chairs, 12 vending machines .... "
User: "What is average age of vending machines ?"
Agent: "Average age of vending machines have been around 3 years ...."
[... 38 more messages about features, comparisons, etc.]

Summarized (500 tokens):

- User interested in assets present in the office ($30/mo)
- Key needs: Vending machine age, warrant details and vendor details
- Concerns: Cost of ownership, frequent failures and energy cost
- Budget approved, pending asset aquisitions 
- Next step: Schedule a presentation with asset procurement team


Why this works ? :
- Preserves essential information
- Reduces token usage by roughly 90-95%
- Allows much longer conversations
- Agent can reference summary when needed

So as a product manager, its not about a intelligent tool but

Do we remember users across sessions?
Are we building just a feature or a relationship with our customers?

With active contemplation about memory management, the competitive edge shift happens form intelligence to continuity.

The winning AI products won’t just answer rather they will remember, adapt, and grow with users.

This means the product should evolve based on below questions that need to be answered, steering AI products in right direction

Should we invest in user memory early?
What data becomes long-term memory?
How do we earn user trust to store memory?

These are not feature discussions or decisions but more of a company strategic direction.

How AI Assistants Remember: A Practical Memory Architecture Guide

Memory Allocation strategy for leftover 98,000 tokens :

Summarization :

Be the first to comment

Leave a comment

Stop Waiting for User Feedback. Simulate It with AI.

Product manager pivoting job role in the era of AI

In the world of Open-claw and having personal AI assistant, how does the safeguards keep execution in check

Be the first to comment

Leave a comment

Stop Waiting for User Feedback. Simulate It with AI.

Product manager pivoting job role in the era of AI

In the world of Open-claw and having personal AI assistant, how does the safeguards keep execution in check

Memory Allocation strategy for leftover 98,000 tokens :#

Summarization :#

Be the first to comment

Leave a comment

More to read

Stop Waiting for User Feedback. Simulate It with AI.

Product manager pivoting job role in the era of AI

In the world of Open-claw and having personal AI assistant, how does the safeguards keep execution in check

Be the first to comment

Leave a comment

More to read

Stop Waiting for User Feedback. Simulate It with AI.

Product manager pivoting job role in the era of AI

In the world of Open-claw and having personal AI assistant, how does the safeguards keep execution in check

Memory Allocation strategy for leftover 98,000 tokens :

Summarization :