All posts
Featured

How AI Assistants Remember: A Practical Memory Architecture Guide

How do AI assistants remember users? Explore memory architecture, context management, and hierarchical memory design powering modern AI agents.

4 min read14 views0 comments
How AI Assistants Remember: A Practical Memory Architecture Guide

I will take an example of Personal AI assistant, this is a good usecase because it needs to remember you preferences and your old coversations to build upon as a continous conversation experience.

User: "I'm planning a trip to Tawang"

Agent: "Great! I remember you prefer boutique hotels and vegetarian restaurants. When are you thinking of going?"

User: "How did you know my preferences?"

Behind the scenes is the Long-term memory (PostgreSQL):

VBNET
SELECT * FROM user_preferences WHERE user_id='user_123'; Results: - accommodation_preference: 'boutique hotels' - dietary_restrictions: 'vegetarian' - travel_style: 'cultural immersion' - budget_tier: 'mid-range' - previous_destinations: ['Singapore', 'Dubai', 'Bentonville'] Short-term memory (current conversation): Picks up from previous conversations with the user User: "I'm planning a trip to Bentonville" Agent: "Great! I remember you prefer..." Internal knowledge: This is what the LLM knows and uses from it's internal data training. - Bentonville is a city in Arkansas - American cuisine includes hamburgers, barbecue (brisket/ribs), fried chicken, apple pie - Arkansas is known for diverse geography ranging from the Ozark Mountains to the Mississippi Delta

There are various memory management strategies which are deployed and below is a very simple example which I used to learn how it works

The Problem : Context window being small, doing long conversations sometimes make model hallucinate or provide inaccurate responses. Model context window can retain 128,000 tokens in total.

  • Lets say the conversation length I had is of 50 messages (roughly 30,000 tokens)
  • Context window is of 128,000 tokens
  • Remaining tokens are (128,000 - 30,000) = 98,000 tokens

Memory Allocation strategy for leftover 98,000 tokens :

  • System prompts can take for example 2,000 tokens
  • Recent messages (my past conversations) around lets say 10 messages : 8,000 tokens
  • Summarised history : 5,000 tokens
  • Retrieved relevant past exchanges : 10,000 tokens
  • User Profile/ Preferences : 3,000 tokens -Retrived documents (RAG) : 70,000 tokens

That's how the memory allocation works where allocation happens around

  • 60-70% to RAG
  • 2-3% to user preferences
  • 10-12% on past relevant exchanges
  • 5% on summarization
  • 8-10% on recent messages
  • 2-3% on System prompt

Summarization :

In order to have a continous experience and conversation, you might have seen applications like Claude, GPT or Cursor started summarizing conversations. The moment you see a message like "Summarizing conversation", it means that the model is now trying to preserve essential information to allow user to start the next token window while I hold this summarization to fetch from whenever needed.

  • Example :
VBNET
User: "Tell me what all different kind of assets does my office has ?" Agent: " We have 1,000 office desks, 1,500 Office Chairs, 12 vending machines .... " User: "What is average age of vending machines ?" Agent: "Average age of vending machines have been around 3 years ...." [... 38 more messages about features, comparisons, etc.] Summarized (500 tokens): - User interested in assets present in the office ($30/mo) - Key needs: Vending machine age, warrant details and vendor details - Concerns: Cost of ownership, frequent failures and energy cost - Budget approved, pending asset aquisitions - Next step: Schedule a presentation with asset procurement team Why this works ? : - Preserves essential information - Reduces token usage by roughly 90-95% - Allows much longer conversations - Agent can reference summary when needed

So as a product manager, its not about a intelligent tool but

  • Do we remember users across sessions?
  • Are we building just a feature or a relationship with our customers?

With active contemplation about memory management, the competitive edge shift happens form intelligence to continuity.

The winning AI products won’t just answer rather they will remember, adapt, and grow with users.

This means the product should evolve based on below questions that need to be answered, steering AI products in right direction

  • Should we invest in user memory early?
  • What data becomes long-term memory?
  • How do we earn user trust to store memory?

These are not feature discussions or decisions but more of a company strategic direction.