Beyond Guardrails: A Structural Architecture for Governing AI Agent Behavior

The problem with “guardrails”

The AI agent community’s dominant governance concept — “guardrails” — is architecturally impoverished. A guardrail is a constraint applied at the boundary of an otherwise ungoverned system. It tells the agent what not to do. It says nothing about:

Who is responsible for each step of a multi-agent workflow
What decisions should be deterministic vs. requiring LLM judgment
What happens when things go wrong — structured exception handling, not retry loops
What governance attributes each activity carries (constraints, authorized inputs/outputs, controlled vocabulary, audit requirements)
How governance cascades through delegation — from organizational intent through orchestration to individual agent actions

Guardrails are the equivalent of putting a fence around a construction site and declaring it “governed.” The building code, the inspection process, the responsibility chain, the decision logic for structural choices — that’s governance. The fence is just a boundary.

What’s actually missing: execution governance

The gap is structural, not policy. Consider what your agent framework provides vs. what governed execution requires:

Your framework provides:

Tool access (function calling, MCP servers, computer use)
Agent coordination (multi-agent orchestration, handoffs, message passing)
State persistence (memory, context management, session state)

Governed execution requires:

Responsibility models — RACI (Responsible, Accountable, Consulted, Informed) per activity. Who owns the outcome? Who must be consulted? Who is informed?
Deterministic decision separation — Some decisions should never involve LLM inference. Compliance classifications, escalation routing, boundary enforcement — these need decision tables with deterministic evaluation, not probabilistic judgment.
Structured exception handling — Typed exceptions with governed responses. Boundary events. Compensation flows. Error escalation that routes to specific handlers, not generic catch blocks.
Activity-level governance attributes — 21 typed attributes per activity: authorization constraints, security classification, data lineage (SIPOC), performance targets, boundary constraints, controlled vocabulary requirements.
Governed decomposition — When an agent delegates to sub-agents, the delegation boundary needs governance: what is authorized, what constraints cascade, what evidence is required.
Audit trails — Not logging. Governance evidence: what was decided, by what logic, with what inputs, producing what outputs, against what constraints.

This is not a new problem. The discipline of Business Process Management has solved every one of these concerns for enterprise-scale operation across banking, healthcare, manufacturing, government, and insurance — for over two decades. The standards exist (BPMN 2.0, DMN 1.0, CMMN 1.0, all OMG). The professional body of knowledge exists (ABPMP BPM CBOK v4.0). The enterprise validation exists.

Context, memory, and intent are not the same thing

The AI agent community uses these terms interchangeably. They are structurally distinct:

Context is the vehicle — the information available to an agent at inference time. You can have rich context with zero governance. A system prompt full of organizational information is rich context. But if it doesn’t decompose into governance primitives, the agent is well-informed but ungoverned.

Memory is the mechanism for persistence. Memory accumulates information; it does not evaluate, prioritize, or structure that information for governance. A vector store full of organizational documents is memory. It is not intent governance.

Intent is the governance content. It decomposes into five structural primitives: Purpose (why this delegation exists), Direction (how to approach the work), Boundaries (what must never happen), End State (what counts as success), and Key Tasks (what work is authorized). Intent is what makes context governed and memory purposeful.

“When practitioners say ’the agent needs more context,’ they often mean ’the agent’s behavior doesn’t serve our purpose’ — a governance gap wearing a mechanism label.”

The three-layer architecture

Three governance layers, each independently necessary:

Layer	Function	Question answered
Constitutional AI (substrate)	Training-time character governance	“What kind of agent is this?”
Intent Stack (governance context)	Four-layer runtime organizational governance	“What is authorized, by whom, why?”
BPM/Agent Stack (execution structure)	Execution governance with process discipline	“How does authorized work get done safely?”

Constitutional AI cannot know an organization’s specific constraints. The Intent Stack cannot specify how to coordinate agents through a gateway. The BPM/Agent Stack cannot determine whether a delegation was authorized. Each layer answers a question the others cannot.

Concrete example: governed computer use

A Claude agent performs a security compliance audit using computer-use capabilities — navigating an admin console, reading MFA configuration, classifying findings.

Without execution governance: The agent navigates, reads, and reports. If it misreads a configuration, it hallucinates a classification. If it encounters an ambiguous state, it guesses. If the admin console presents an “Edit” button, the guardrail says “don’t click it” but the architecture doesn’t structurally exclude it.

With BPM/Agent Stack governance:

14-step BPMN process model with governance attributes per activity
Read-only boundary constraint — Edit, Modify, Delete controls are structurally excluded from authorized actions. The constraint is architectural, not advisory.
DMN decision table (UNIQUE hit policy) — 9 rules for MFA compliance classification. Zero ambiguity. The agent observes (LLM capability); the decision table classifies (deterministic logic). Observation and judgment are separated.
Escalation routing (FIRST hit policy) — Critical findings → CISO immediately. Indeterminate findings → human auditor. The priority ordering is a governance decision encoded in a decision table, not an LLM inference.
Rule 9: The escalation trigger — When the agent cannot read or interpret configuration, it does NOT guess. It escalates to a human with screenshots as evidence. This is governance-by-design, not guardrails-by-hope.

What this means for agent framework builders

The BPM/Agent Stack does not replace LangGraph, CrewAI, or any other framework. It provides the governance layer they are missing. The specification is grounded in open standards, not proprietary implementation:

For orchestration frameworks: Process structure elements (typed gateways, structured exception handling, event-driven flows) that formalize what you currently implement ad-hoc in code
For tool-use frameworks: Activity-level governance attributes that make each tool invocation auditable, bounded, and accountable
For multi-agent systems: Governed delegation interfaces where each agent-to-agent handoff carries governance content, not just task descriptions
For platform providers: The governance model above infrastructure — AWS Bedrock AgentCore provides deterministic policy enforcement; the BPM/Agent Stack provides the governance architecture that tells policy enforcement what to enforce

The specifications are published at intentstack.org and bpmstack.org, both CC BY 4.0.