Agent Memory

Watch or read first

Daily Dose DS, "Memory Types in AI Agents" and "Importance of Memory for Agentic Systems" in the AI Engineering Guidebook (2025, paid): https://www.dailydoseofds.com/ai-engineering-guidebook/
MemGPT / Letta paper: https://arxiv.org/abs/2310.08560 (Packer et al.)
Zep (https://help.getzep.com/), Mem0 (https://docs.mem0.ai/), Letta (https://docs.letta.com/) docs - the three main memory frameworks in 2026.

TL;DR

Memory turns a stateless LLM into a stateful agent. Short-term memory = conversation window. Long-term memory persists across sessions. Episodic = past events. Semantic = learned facts. Procedural = learned how-to. Without memory, every interaction is a blank slate. With memory, agents personalize, learn, and accumulate knowledge.

The historical problem

An LLM call is pure function: prompt in, text out. No state between calls. For one-shot tasks, this is fine. For assistants, coaches, tutors, companions, it is a disaster:

User: "My name is Alice."
(5 minutes later) Agent: "Who is this? What should I call you?"

In 2022-2023, teams hacked around this by dumping entire conversation histories in the prompt every call. It worked up to ~8k tokens, then broke. Summarization workflows helped but lost detail.

Real memory is a system design problem, not a model feature. Daily Dose DS: memory is not a property of the model itself. It is a system design problem.

How it works: the memory hierarchy

Inspired by human cognition, agent memory follows a tiered structure.

Short-term memory

Exists only during one execution or session
Implemented as a conversation buffer
Includes recent messages, tool observations, scratch work
Bounded by context window

Examples: the current chat turn, ReAct loop's running log, the system prompt for this session.

Long-term memory

Persists across sessions
Stored in an external system (vector DB, graph DB, SQL, files)
Must be retrieved on demand (it does not fit fully in the prompt)

Sub-types (humans-inspired):

Semantic memory

Facts and knowledge. "The company's return policy allows refunds within 30 days." These are indexed and retrieved when relevant.

Episodic memory

Past events and experiences. "Last Tuesday, Alice asked about refund policy X and we resolved it with case #12345." Useful for personalization and learning from past interactions.

Procedural memory

Learned how-to. Skills, instructions, workflows the agent internalized. Often stored as updated system prompts or learned tool usage patterns.

Entity memory

Tracks specific subjects (users, products, orders) with structured attributes. Like a CRM for the agent. "User Alice: location Paris, language FR, preferred channel email."

Contextual memory

Daily Dose DS's catch-all for keeping relevant context available across a session. Overlaps with short-term.

User memory

Specifically about the current user: preferences, history, preferences. A specialization of entity memory.

The simulation problem

LLMs do not "remember" in a biological sense. The system simulates memory by:

Deciding what to keep (not everything fits)
Storing it externally
Retrieving relevant pieces before each new model call
Inserting them in the prompt

Every memory framework (Zep, Letta, Mem0) is a specific implementation of "what to keep, when to retrieve, how to insert".

Architecture patterns

Pattern 1: conversation buffer (short-term only)

[system prompt] + [all messages so far] -> LLM

Simplest. Breaks when conversation exceeds context window.

Pattern 2: summarizing buffer

[system prompt] + [summary of old turns] + [recent turns verbatim] -> LLM

A cheap LLM summarizes older turns. Keeps the prompt small. Loses detail.

Pattern 3: RAG over conversation history

User query -> embed -> search past conversation -> retrieve relevant turns
                                                   [system prompt] + [retrieved turns] + [query] -> LLM

Scales to arbitrarily long history. Misses very recent turns if not indexed yet.

Pattern 4: tiered memory (MemGPT / Letta)

RAM-like:  current working context (in-prompt)
Disk-like: long-term store (vector DB)
The LLM itself issues commands like "save to long-term memory" or "recall from long-term memory".

The LLM manages its own memory. Brilliant, complex. Letta (open-source MemGPT) is the reference.

Pattern 5: external memory with background updates

Main loop: agent runs, actions happen.
Background: a "memory agent" reads actions and updates stores (user profile, facts, episodes).
Retrieval: per query, memory agent fetches relevant items and injects.

Zep, Mem0 use variants of this.

Relevance today (2026)

Memory is the bottleneck for "real" assistants

A 2026 AI assistant that forgets your name by turn 3 is a dealbreaker. ChatGPT, Claude, Gemini all launched memory features in 2024-2025 for this reason. The ecosystem followed.

Frameworks are still immature

Contrast with vector DBs (mature) and function calling (mature). Memory in 2026 is where vector DBs were in 2022: many options, no clear winner.

Main players:

Zep - graph-based memory, strong for entities and relationships
Letta (ex-MemGPT) - self-editing memory, MIT-grade
Mem0 - simple API, popular for quick starts
LangMem (LangChain) - integrated with LangGraph
OpenAI Memory / Anthropic Memory beta - built into the API for specific products

Each makes different trade-offs. Benchmark on your use case.

RAG and memory converge

Daily Dose DS makes the arc clear:

RAG (2020-2023): read-only, one-shot
Agentic RAG: read-only via tool calls
Agent Memory: read-write via tool calls

Modern agents treat the memory store as just another tool: search_memory, save_to_memory, update_memory. It is RAG with writes.

Graph RAG and memory

Graph-based memory (Zep, Neo4j + LLMs) captures relationships: "Alice works at Acme Corp, reports to Bob, is based in Paris, speaks FR/EN". Graph queries unlock reasoning that flat vector search cannot.

Personalization is the killer app

Users do not care about "multi-agent orchestration". They care that the assistant remembers their kid's name, their project, their preferences. Memory, properly done, is the step from "cool demo" to "I use this every day".

Privacy

Memory is a privacy tight-rope. Storing user messages long-term means regulatory exposure (GDPR, HIPAA). Serious apps:

Encrypt at rest
Support user-initiated deletion
Redact PII before storage
Separate per-tenant

Critical questions

Should long-term memory be managed by the LLM or by the system? (Trade-off: LLM-managed = more flexibility, less predictable cost; system-managed = bounded cost, less adaptive. Hybrid wins.)
What happens when memory contradicts itself? (User changes preference. Old and new fact both stored. Retrieval surfaces both. Agent confused. Need conflict resolution: prefer newest, flag contradictions, or surface choice to user.)
Does memory reduce the need for fine-tuning? (Partially. Memory adapts behavior via retrieval, no weight changes. But it does not teach new styles; fine-tuning still fits for that.)
How much memory is enough? (Depends on use case. Personal assistant: thousands of facts. Enterprise support bot: millions. Start small.)
Memory vs prompt caching, same thing? (No. Prompt caching is KV cache re-use for identical prefixes. Memory is a retrieval-based injection of relevant facts per query. Orthogonal, both useful.)
Should memory be shared across users? (No for private info. Yes for anonymized learnings. Be explicit.)

Production pitfalls

Unbounded growth. Every turn adds to memory. Year later, the store is enormous and retrieval is slow. Add eviction, summarization, archival.
Stale facts override fresh ones. "User prefers X" from 6 months ago beats "user changed to Y" last week. Time-weight retrieval.
Memory leakage between users. One user's fact retrieved for another. Scope every query strictly by user_id / tenant.
No ground truth. You cannot tell if memory retrieved the right thing without eval. Build a memory eval suite.
Hallucinated memory. LLM thinks it "remembers" something that was never stored. Always ground responses in actual retrieved content.
Over-caching vs under-caching. Cached memory is stale; fresh memory is expensive. Tune the refresh cadence per memory type.
Privacy blast radius. Memory stores contain raw user text. Treat like PII. Encryption, access control, retention limits.
Compliance. GDPR "right to be forgotten" means per-user delete must work across all memory stores.

Alternatives / Comparisons

Approach	Write?	Across sessions?	Scales?	Complexity
No memory	No	No	-	Trivial
Conversation buffer	No	No	Small	Low
Summarizing buffer	No	No	Medium	Low
RAG over history	No	Yes	High	Medium
Entity memory	Yes	Yes	Medium	Medium
MemGPT / Letta	Yes	Yes	High	High
Zep / Mem0	Yes	Yes	High	Medium
KV cache persistence (Claude memory)	No (read-only)	Yes	High	Managed

Mental parallels (non-AI)

Human memory architecture: working memory (7 plus or minus 2 items), short-term (minutes), long-term (semantic, episodic, procedural). Agent memory maps directly.
Note-taking: the LLM is the person. Short-term = scratch paper on the desk. Long-term = filed notebooks. You cannot fit the notebooks on the desk, so you fetch relevant pages.
Database transactions: writes go to persistent storage; reads pull back into working memory. Classic OLTP pattern.
CRM for a salesperson: the salesperson's memory is bad, so every customer interaction is logged. Before a call, the CRM surfaces relevant history. Same pattern as agent memory.
The Pensieve (Harry Potter): extract a memory into an external vessel, revisit later. Agent memory externalizes what would not fit in the model's head.

Mini-lab

labs/agent-memory/ (to create):

Build a personal assistant agent with three memory types:
- Short-term: conversation buffer (last 20 turns)
- Semantic: facts about the user (Mem0 or Zep)
- Episodic: past conversations (vector DB of embedded summaries)
Simulate 10 sessions across 2 weeks.
Test recall: "what did I ask you about last week?" "what is my favorite color?"
Measure:
- Recall accuracy
- Memory store size growth
- Per-query cost
Add a forgetting mechanism (decay weight on old episodes) and compare.

Stack: uv, langchain, mem0ai or zep-python, anthropic.

Agent Memory

Agent Memory

Watch or read first

TL;DR

The historical problem

How it works: the memory hierarchy

Short-term memory

Long-term memory

Semantic memory

Episodic memory

Procedural memory

Entity memory

Contextual memory

User memory

The simulation problem

Architecture patterns

Pattern 1: conversation buffer (short-term only)

Pattern 2: summarizing buffer

Pattern 3: RAG over conversation history

Pattern 4: tiered memory (MemGPT / Letta)

Pattern 5: external memory with background updates

Relevance today (2026)

Memory is the bottleneck for "real" assistants

Frameworks are still immature

RAG and memory converge

Graph RAG and memory

Personalization is the killer app

Privacy

Critical questions

Production pitfalls

Alternatives / Comparisons

Mental parallels (non-AI)

Mini-lab

Further reading

Canonical

Related in this KB

Frameworks