ai-agentsmemorycontextretrievalarchitecturestate

What an AI Agent Should Remember and What It Should Forget

By John Babich3/22/20268 min read

Intermediate

What an AI Agent Should Remember and What It Should Forget

There is a moment in almost every agent project where someone says, "We need memory."

Usually that sentence lands right after the first set of real user sessions. The agent forgot a preference. It repeated a question. It gave a response that was technically correct but completely detached from the customer's history, team conventions, or workflow stage.

So the team adds memory.

At first, that feels like progress. Store more transcripts. Save more facts. Push more context into the next turn.

Then the system starts to feel haunted.

It remembers things that no longer matter. It repeats guesses as if they were confirmed facts. It drags old preferences into new workflows. It treats one offhand sentence from three weeks ago like a governing rule.

That is the trap. Most memory failures do not come from forgetting too much. They come from remembering the wrong things in the wrong way.

If you want a useful agent, memory cannot be a dumping ground. It has to be a product decision.

The first mistake: calling everything "memory"

Most teams collapse three separate concerns into one bucket:

Long-term user or team preferences
Current workflow state
Retrieved knowledge from documents or systems

Those are not the same thing.

If a user prefers bullet summaries instead of long prose, that is memory.

If an expense approval is currently waiting on finance and the last action was taken two hours ago, that is workflow state.

If the current refund policy changed last Tuesday and now lives in the operations handbook, that is retrieval.

When those categories blur together, the agent becomes unreliable in predictable ways. It pulls stale policy into a live decision. It treats a temporary workflow fact like a stable preference. It stores noise because it has no clear threshold for what deserves to persist.

An experienced system separates these on purpose.

The memory test I use in practice

Before storing anything, I ask three questions:

Will this still be useful later?
Is it stable enough to survive the next interaction?
Would I be comfortable showing it back to the user as something the system "knows"?

If the answer to any of those is no, I usually do not store it as memory.

That third question matters more than people think. A lot of agent memory is built as if it only exists for the machine. That is a mistake. If the memory cannot be explained, corrected, or revoked, it will eventually become a trust problem.

What is worth remembering

The best agent memory is boring in the best possible way. It reduces friction without changing the nature of the task.

Good memory candidates include:

Stable output preferences
Preferred tools or environments
Known naming conventions
Approved workflow defaults
Durable customer or team context that is hard to restate every time

Examples:

"Use Azure examples before AWS unless asked otherwise."
"This team wants rollout summaries first, raw logs second."
"Default to staging for dry runs."
"The customer refers to the integration gateway as the broker layer."

Notice what these all have in common. They are useful, narrow, and unlikely to be invalidated by the next turn.

They make future work smoother without pretending to be a full historical brain.

What should usually be forgotten

The system should forget much more aggressively than most teams expect.

Things I avoid storing as long-term memory:

Every chat transcript
Draft reasoning chains
Unconfirmed assumptions
Temporary task details
Guessed preferences
Emotional tone from a single interaction

This is where teams get into trouble. They build memory like a scrapbook and then act surprised when the agent starts acting like a scrapbook.

If someone says, "Maybe we should try a CSV export next time," that is not durable memory. It is conversational context. Maybe it matters for the session. Maybe it matters for the current task. It probably does not deserve to survive as a standing preference until confirmed.

Memory should be earned.

Session memory is not long-term memory

One useful discipline is to distinguish between session context and durable memory.

Session context is what the agent needs to stay coherent during the current interaction. It may include open questions, active entities, temporary constraints, and partial progress. That context matters, but most of it should expire naturally when the task is done.

Durable memory is what should survive beyond the session because it meaningfully improves future work.

This sounds obvious, but a lot of systems skip the distinction and jump straight from "the model saw it" to "the system should remember it." That is how state leaks into preference and preference leaks into policy.

The cleaner pattern is:

Keep active task context in workflow state
Keep stable user or team facts in durable memory
Keep changing external facts in retrieval

That separation makes debugging easier, and it also makes deletion and correction possible later.

Memory needs a correction model

If the system can remember, it also needs a way to be wrong safely.

That means:

users should be able to inspect important remembered facts
memories should be updateable
some memories should expire automatically
ownership should be clear for high-impact memory

I do not trust an agent memory system that only has a "write" path.

If the agent remembers, "This team always wants direct execution," and that assumption becomes false, the cost is not theoretical. The agent will keep acting with confidence based on something that used to be useful and is now dangerous.

Memory systems need maintenance just like knowledge systems do.

The difference between preference, state, and policy

One of the most useful habits in agent design is to label stored information by type.

I like a simple schema:

preference
workflow_state
reference_fact
policy_constraint

That forces clearer decisions.

A preference can usually be user-controlled.

Workflow state should often expire with the job.

A reference fact may need freshness checks.

A policy constraint should probably not live in user memory at all; it should come from an authoritative system or document source.

That last point is where many agent memory designs go wrong. Teams accidentally store policy in memory because it feels convenient. Then a policy changes, and the system quietly keeps using old rules because nobody thought of memory as a governance surface.

The real risk is not storage cost

People often worry about memory because of implementation complexity or database shape. Those matter, but they are not the biggest risk.

The biggest risk is behavioral drift.

The more unstructured memory you feed back into the system, the more opportunities you create for:

stale assumptions
over-personalization
permission mistakes
weird carryover between tasks
responses that feel overconfident for the evidence available

Bad memory does not just make answers worse. It changes the posture of the system. The agent starts sounding like it knows more than it should, and users stop being able to tell what is retrieved, what is remembered, and what is invented in the moment.

That is when trust erodes.

A practical memory policy for small teams

If you are building an agent without a large platform team behind it, keep the memory policy simple.

I would start with these rules:

Store only durable preferences and confirmed stable facts.
Keep workflow progress in state, not long-term memory.
Retrieve policies and changing operational facts from authoritative sources.
Add expiration rules to memory that can reasonably drift.
Give users or operators a way to correct stored memory.
Log when memory is used in a high-impact decision.

That policy will feel conservative. Good. Conservative is usually how you prevent an agent from turning into a mythology generator.

The strongest memory systems feel quiet

When memory is working well, it does not announce itself very often.

The agent simply feels less repetitive. It formats things the way the team expects. It does not ask the same setup question every time. It handles continuity without making the conversation feel sticky or invasive.

That is the goal.

Not "never forget anything." Not "build a second brain." Not "remember every interaction forever."

Just enough continuity to make the system more useful, without storing more than you can explain or govern.

That is what a mature memory system looks like.

The short version

If you only remember one thing, make it this:

An agent should remember what reduces future friction, forget what was merely temporary, and retrieve what can change outside the conversation.

Once you internalize that distinction, memory stops looking like magic and starts looking like architecture.

And that is when it finally becomes useful.

Useful tools for this topic

If you want to turn this article into a concrete next step, start with one of these.

Context Selector

Architecture

Figure out whether a given failure or task needs retrieval, memory, workflow state, or a clarifying question.

Open tool

Memory Policy Builder

Architecture

Define what the system should remember, what it should avoid storing, and how that memory is corrected.

Open tool

RAG Readiness

Architecture

Check whether your documents, freshness rules, and ownership model are strong enough for retrieval to work.

Open tool

Subscribe to AgentForge Hub

Get weekly insights, tutorials, and the latest AI agent developments delivered to your inbox.

No spam, ever. Unsubscribe at any time.

Loading conversations...