Controlled Agency | Issue 04: The Enforcement Gap
Monitoring tells you what happened. Enforcement changes what happens.
Monitoring tells you what happened. Enforcement changes what happens.
These are not the same problem. They require different architectures, different data structures, different intervention logic, and different failure modes. Most teams building agentic AI systems are solving the first problem and calling it the second.
This issue is about what building behavioral enforcement actually requires, the engineering problems, the structural constraints, and the implementation tradeoffs that determine whether your enforcement layer does what it claims.
Two constraints enforcement has that monitoring does not
Enforcement introduces two constraints that monitoring does not have.
The first is the real-time requirement: A monitoring system that produces a report six hours after a session completes is useful. An enforcement system that produces an intervention six hours after a session completes is useless. Enforcement operates on the agent’s execution timeline, not on the analyst’s review timeline. The enforcement layer must evaluate behavioral state, make a decision, and execute an intervention; all within the latency budget of the agent’s reasoning loop. For most deployed systems, that budget is measured in seconds, sometimes less.
The second is the state requirement: Monitoring can be stateless, each event evaluated independently against a ruleset. Enforcement designed to act on behavioral trajectories cannot be stateless. To enforce against a pattern that accumulates across turns, you need to maintain a representation of what has accumulated. That representation is the session behavioral state, and managing it correctly is the central engineering challenge of behavioral enforcement.
The session behavioral state object
The foundational data structure in any behavioral enforcement layer is the session behavioral state, a running representation of what this agent has done in this session, updated after each turn, queryable by the enforcement logic at any point during execution.
At a minimum, four things.
A safety score time series: For each turn in the session, a numeric representation of the safety posture of that turn’s action. The scoring function matters; it needs to be sensitive to the specific failure modes relevant to your deployment context, not a generic toxicity score repurposed from a different problem domain.
A delta sequence: The turn-by-turn change in safety score — δ(t) = S(t) − S(t−1). This is the signal that distinguishes gradual drift from stable behavior. A safety score of 0.6 means something very different if the previous score was 0.9 than if the previous score was 0.5. The absolute value tells you where you are, the delta tells you which direction you are moving, and how fast.
A tool invocation log: A structured record of every tool the agent called, in sequence, with the parameters passed and the outputs returned. This is the action layer of the session, what the agent actually did to external systems, not just what it said. Behavioral enforcement that does not include the tool invocation state is enforcement over the reasoning layer only. The consequential actions are in the tool layer.
A turn classification sequence: For each turn, a categorical label (benign, anomalous, escalation-eligible, constraint-violation) is assigned by the enforcement layer’s detection logic as the session progresses. This sequence is what the intervention logic reads. It is how the enforcement layer knows whether the current state of the session warrants a response.
This object is not large. For a typical agent session of 15 to 20 turns, the behavioral state object is kilobytes, not megabytes. The engineering challenge is not storage, it is consistency. The object must be updated atomically after each turn, must be readable by the enforcement logic without blocking the agent’s execution, and must survive agent handoffs intact.
Behavioral invariants
The session behavioral state tells you what is happening. Behavioral invariants tell you what should not happen.
A behavioral invariant is an explicit, machine-evaluable constraint on agent behavior for a given session context. It is different from a content policy in a specific way. A content policy is evaluated against a single output: does this response contain prohibited content? A behavioral invariant is evaluated against the session state: has the agent’s behavior across this session crossed a defined threshold?
Some examples of what behavioral invariants look like in practice.
A turning point invariant: if the delta sequence contains two consecutive turns where |δ(t)| ≥ 0.20, trigger a constraint injection; this fires on the detection of accelerating safety decay, not on any single output.
A tool sequence invariant: if the agent has invoked a data retrieval tool followed by an external API call within the same session, require explicit re-authorization before the API call proceeds. This catches a specific chaining pattern regardless of what the individual calls look like in isolation.
A depth invariant: if the agent has delegated to a sub-agent and the sub-agent has delegated further, flag for human review before any further tool invocations at delegation depth three or greater.
A time-window invariant: if more than 40 percent of the session’s tool invocations have occurred in the last three turns, the agent is accelerating. Apply a soft throttle regardless of the content of those invocations.
Behavioral invariants are the specification layer of behavioral enforcement. Without them, you have a detection capability but no enforcement policy.
Writing good invariants is difficult as they need to be specific enough to be actionable, general enough to catch the attack class they are designed to detect, and calibrated carefully enough that they do not fire on benign behavior at a rate that makes the enforcement layer a friction source. Invariant calibration is an empirical problem. It requires running the enforcement layer against real session data and tuning thresholds against measured false positive and false negative rates.
The intervention mechanism
When a behavioral invariant fires, the enforcement layer needs to do something. The set of things it can do is the intervention mechanism.
Interventions exist on a spectrum from soft to hard. Getting this right matters, an enforcement layer that always terminates sessions on invariant violation will be disabled by the engineering team within a week. An enforcement layer that only logs and alerts is not enforcing anything.
A workable intervention ladder.
At the soft end: constraint injection. The enforcement layer inserts a message into the agent’s context that narrows its action space for the next turn. Lightest touch, lowest reliability, a sufficiently drifted agent may reason around the constraint.
One level harder: session pause. The agent’s execution is held in a wait state. No further tool invocations proceed until a specified condition is met, timer expiry, or human operator review. The session is not terminated, the state is preserved, and if the reviewer clears the session, execution resumes from the exact point it was paused. This is the right intervention for trajectory detection where a breach has not yet occurred.
Harder still: capability restriction. The agent’s tool access is modified mid-session. Specific tools are revoked while the session continues with a reduced capability set. Appropriate when the invariant that fired is specific to a tool chaining pattern.
At the hard end: session termination with state snapshot. Session ends immediately. The full behavioral state object is snapshotted and written to the forensic audit layer. The termination event is logged with the specific invariant that triggered it.
The intervention ladder needs to be defined before deployment. The question of what the enforcement layer does when it fires must have a written answer for each invariant, at each severity level, before the system goes live.
The handoff problem
Multi-agent architectures introduce a failure mode that single-agent enforcement does not have: the stateless handoff.
When an orchestrator delegates a task to a worker agent, the worker starts fresh. It receives its task instructions. It does not receive the orchestrator’s session behavioral state. Whatever behavioral trajectory was building in the orchestrator session (whatever invariants were approaching their thresholds) is invisible to the worker. The worker’s enforcement layer begins evaluating from a clean baseline.
Closing this gap requires session state propagation as a first-class component of the handoff protocol. The worker agent must receive, alongside its task instructions, a serialized representation of the session behavioral state from the orchestrator. The worker’s enforcement layer must initialize from that inherited state rather than from zero.
This is not architecturally complex. It requires agreeing on a session state schema, implementing serialization and deserialization, and modifying the handoff message format to include the state payload. The complexity is in the standardization, multi-agent pipelines with components from different vendors or different teams need a shared schema that all enforcement layers can read and write.
There is no established protocol for session behavioral state propagation across agent handoffs. Every team that has thought seriously about multi-agent enforcement has implemented a bespoke solution. The field needs a shared specification, and it does not have one yet.
What breaks in a naive implementation
Three implementation mistakes appear repeatedly in enforcement layers built without accounting for the specific constraints of agentic execution.
The first is evaluating the state object synchronously in the agent’s critical path. If the enforcement logic must complete before the agent can proceed to the next turn, and it is doing anything computationally expensive, the enforcement layer becomes a latency bottleneck. The fix is to separate the state update, which must be synchronous, from the invariant evaluation, which can be asynchronous with a pre-computed risk budget the agent draws against.
The second is designing the session state object for the happy path. Most implementations track what the agent did correctly. A behavioral enforcement layer needs to track specifically what the agent did that was anomalous, the turns that approached invariant boundaries, the tool invocations that matched suspicious patterns, and the delta values that spiked. The state object needs to be optimized for anomaly representation, not for audit completeness.
The third is treating intervention as a terminal state. Many enforcement layers are designed with session termination as the primary or only intervention. This creates a binary failure mode for the detection of gradual drift, the agent either runs unconstrained or gets killed. The intervention ladder exists to fill the space between those two outcomes with graduated responses that match the severity of what was detected.
The implementation order
If you are building a behavioral enforcement layer for the first time, the order in which you build the components matters.
Start with the session state object. Define the schema. Implement the update logic. Verify that the state is correctly maintained across turns before you write a single line of invariant logic. A behavioral enforcement layer built on an inconsistent state representation will produce unreliable enforcement regardless of how well the invariants are written.
Then write the invariants. Start with two or three. Pick the ones that correspond to the specific failure modes you are most concerned about. Calibrate against real session data before you wire them to the intervention mechanism.
Then build the intervention mechanism, starting from the soft end of the ladder. Constraint injection and session pause are lower-stakes to get wrong than capability restriction and termination. Build confidence in the trigger logic before you connect it to hard interventions.
Last, build the handoff propagation layer. It is last not because it is least important (in a multi-agent pipeline, it may be the most important component), but because it depends on the state object schema being stable. Changing the schema after you have built serialization and deserialization is expensive.
Build in that order.
Test each layer before connecting it to the next.
The system you end up with will be more reliable and easier to debug than one assembled all at once.

