CONTROLLED AGENCY | Issue 08: The Memory Problem

How AI agents carry attacks across sessions, users, and time

Jun 02, 2026

What Persistent Memory Means Architecturally

Agentic AI systems implement persistent memory through several mechanisms, each with a distinct attack surface and persistence model.

The most common approach is retrieval-augmented memory. Past interactions are embedded into a vector store and retrieved when future interactions appear semantically relevant. The agent does not “remember” in the human sense. Instead, it retrieves fragments of prior context and incorporates them into its reasoning process. To the user, the system appears to remember; to a defender, the memory store becomes a persistent trust boundary that exists outside the immediate runtime of any individual session.

A second approach uses explicit memory storage. Agents are provided tools that allow them to write information to a persistent store, which remains available across future interactions. If an attacker can influence what is written into that store, they may be able to introduce information or instructions that influence later sessions long after the original interaction has ended.

A third persistence mechanism exists in systems that update behaviour through reinforcement pipelines, post-training adaptation processes, or other forms of model improvement informed by historical interactions. Unlike external memory stores, persistence in these architectures can become embedded within behavioural patterns rather than stored records. Detecting and reversing unwanted influence in these systems is often significantly more difficult because the persistence mechanism itself is less transparent.

Each of these approaches creates a different security challenge, requires different monitoring strategies, and presents different remediation paths when compromise occurs.

The Memory Poisoning Problem

Recent research has demonstrated that long-term memory systems can become targets for indirect prompt injection and memory poisoning attacks. Studies such as MemoryGraft and MemMorph have shown that malicious content introduced during one interaction can influence agent behaviour in future interactions by contaminating persistent memory stores.

The practical scenario is worth examining because it illustrates why this attack class differs from traditional session-bound attacks.

An attacker submits a support ticket to an organization’s AI-powered customer service agent. The ticket contains carefully crafted instructions embedded within otherwise legitimate content. During processing, the agent stores information derived from that interaction in its memory system.

The support ticket is closed, and the session ends.

Weeks later, a different user interacts with the same agent on an unrelated matter. The agent retrieves information from its memory store and incorporates it into its reasoning process. The influence of the original interaction has now crossed session boundaries and affects future behaviour without any further action from the attacker.

The critical observation is not that memory exists. The critical observation is that memory is frequently treated as a trusted context simply because it originated from the system itself. If an attacker can influence what enters memory, they may gain influence over future behaviour through a mechanism that survives session termination.

That assumption of trust is where the risk emerges.

Why Session-Level Detection Is Not Enough

The monitoring architectures discussed in earlier issues focus on what occurs within a single session. They are designed to detect dangerous behavioural trajectories before harmful actions occur.

Memory-based attacks challenge that model.

The session in which malicious memory is introduced may appear entirely benign. The agent processes a support request, stores information, and completes the interaction without triggering any behavioural threshold or safety intervention.

The attack manifests later.

The compromise exists in the persistence layer between sessions rather than in the observable sequence of actions occurring within a single session.

This distinction matters because it changes what defenders need to monitor. Session-level monitoring remains necessary, but it is unlikely to be sufficient for deployments that rely on persistent memory.

Organizations increasingly require visibility not only into what agents do during interactions but also into how the operational context that shapes those interactions evolves over time.

The Supply Chain Dimension

The memory problem is not the only manifestation of persistent trust.

In 2026, researchers and security vendors documented large-scale malicious skill and plugin campaigns targeting AI agent ecosystems. These campaigns involved attackers publishing hundreds of malicious tools disguised as legitimate productivity, cryptocurrency, and developer utilities.

These incidents are not memory poisoning attacks in the strict sense. They are supply-chain attacks.

However, they share an important characteristic with memory compromise: both exploit trusted context.

In memory poisoning attacks, the trusted context is stored in memory.

In supply-chain attacks, the trusted context is the external tool ecosystem that agents are allowed to access and invoke.

The underlying lesson is similar. Agentic systems increasingly depend on external sources of context, capability, and decision support. When those sources are trusted by default, they become attractive targets for adversaries.

The Coding Agent Disclosure Wave

Recent disclosures affecting coding agents have demonstrated how broadly trust assumptions can impact agent security.

Researchers have reported vulnerabilities involving unsafe file handling, permission management weaknesses, and trust-boundary failures across multiple coding-agent platforms. While the technical details differ, the recurring pattern is consistent: agents are often granted access to files, tools, and resources that they trust more than they verify.

These incidents are not memory attacks; they are examples of a broader architectural challenge.

Whether the trusted object is:

a memory entry,
a file path,
a plugin,
an MCP server,
or a permission grant,

The security question remains the same:

How does the system determine that the information or capability it is relying upon is trustworthy?

Many recent agentic AI vulnerabilities can be traced back to failures in answering that question.

What Defending Against the Memory Problem Requires

Three controls deserve particular attention.

Memory Provenance Tracking

Every memory entry should include information about:

when it was created,
which session created it,
which inputs influenced it,
and which mechanism stored it.

This does not prevent memory poisoning. An attacker capable of influencing memory creation may also influence the associated metadata.

However, provenance significantly improves forensic analysis, accountability, and incident investigation by establishing a chain of custody for memory formation.

Cross-Session Memory Integrity Monitoring

Organizations should monitor memory stores for unusual patterns of modification.

Practical starting points include:

abnormal memory write frequency,
instruction-like language appearing in memory entries,
unusual memory growth patterns,
and memory updates immediately following retrieval of external content.

Monitoring behaviour surrounding memory updates is often more feasible than attempting to determine the intent of every memory entry.

Tool and Plugin Verification

Every skill, plugin, or external tool available to an agent should be verified before use.

This may include:

publisher verification,
cryptographic signatures,
permission reviews,
software bill of materials validation,
and behavioural analysis.

The objective is not to eliminate supply-chain risk. It is to reduce the probability that untrusted capabilities become trusted by default.

The Governance Gap

Current governance efforts are increasingly converging around several recurring themes:

agent identity,
authorization,
auditability,
accountability,
and human oversight.

These represent significant progress.

However, guidance surrounding cross-session behavioural integrity remains comparatively immature.

Most governance discussions focus on what an agent did during an interaction.

The memory problem raises a different question:

Can we verify that the context shaping an agent’s decisions has not been compromised between interactions?

An agent may pass every session-level audit requirement while still operating on corrupted memory, compromised tools, or manipulated behavioural context accumulated from previous sessions.

The governance challenge is therefore expanding.

The industry is beginning to develop mechanisms for understanding what agents do.

The next challenge is understanding what agents have been caused to remember.

Closing

The field does not have this solved.

Session-level monitoring, behavioural enforcement, agent identity, and auditability remain essential foundations. None of them directly addresses the problem of persistent influence across sessions.

Memory provenance tracking, cross-session integrity monitoring, and tool verification represent early attempts to address that challenge. They are not yet standard practice. They are not consistently implemented across production deployments. And they remain only partially addressed within current governance frameworks.

The industry’s first governance challenge was understanding what agents do.

Its next challenge may be understanding what agents remember.

Miracle Owolabi, AI security researcher, publishes Controlled Agency. Subscribe at arksher.substack.com

Miracle Owolabi

Discussion about this post

Ready for more?