What Carlini's Zero-Day Research Reveals About the AI Security Governance Gap

Purpose

In March 2026, Nicholas Carlini of Anthropic’s Frontier Red Team demonstrated AI models autonomously discovering zero-day vulnerabilities in production software — including Ghost CMS and the Linux kernel — at a speed and scale that security researchers described as a paradigm shift comparable to the discovery of stack smashing. Every finding documented in this analysis is based on Claude Opus 4.6, the model available today. Claude Mythos — a reported step change beyond Opus 4.6, currently in internal testing — makes every governance gap identified below more stark and more serious to address, because the agents operating without governance infrastructure are about to become dramatically more powerful.

The security community’s response has focused on capability: how powerful are these models, how fast are they improving, what does this mean for offense and defense? That response is appropriate but incomplete. It misses a structural question: what governance infrastructure should exist around AI agents performing security work?

This document maps seven specific governance gaps from Carlini’s findings — drawn from his interview with David Adrian, Deirdre Connolly, and Thomas Ptacek and Anthropic’s published research — to the governance controls your organization is already required to maintain.

The standards gap is not in the mandates — it’s in the implementation. Your organization likely already maintains compliance with NIST SP 800-53 control families (Access Control, Audit and Accountability, Configuration Management, Identification and Authentication, Incident Response, System and Information Integrity, Assessment Authorization and Monitoring). ISO 27001 domains cover much of the same ground. These standards mandate the governance controls that the seven gaps below describe. What they don’t specify is how those controls operate at runtime when the actor is an AI agent that makes thousands of decisions per minute, autonomously selects tools, and operates at speeds that make per-action human authorization impractical.

That implementation gap has a structural answer. The discipline of Business Process Management — standardized through BPMN 2.0 (process governance), DMN 1.0 (deterministic decision logic), and codified in the ABPMP BPM CBOK — provides the runtime execution governance that security standards mandate but don’t specify for AI agents. The relationship is the same as between ISO 9001 (which mandates quality management) and BPMN (which provides the process infrastructure to implement it): the security standard says what must be controlled; the BPM discipline says how to structurally control it.

Two published specifications formalize this bridge: the Intent Stack Reference Model (CC BY 4.0) addresses governance context — what an agent is authorized to do, by whom, under what constraints, and how alignment is continuously assessed. The BPM/Agent Stack (CC BY 4.0) addresses execution governance — how authorized work gets done with process discipline, deterministic decision separation, structured exception handling, and audit trails. Both are open, both are grounded in established standards, and both are referenced throughout this analysis where they provide the formal architecture that bridges your existing security mandates to AI agent runtime execution.

This analysis is written for CISOs, security architects, and standards body reviewers — particularly those evaluating the NIST NCCoE concept paper on Software and AI Agent Identity and Authorization (comments due April 2, 2026).

1. The Ungoverned Agent

Security mandate: NIST SP 800-53 AC-6 (Least Privilege), AC-2 (Account Management), IA-2 (Identification and Authentication)

What Carlini described:

Carlini runs Claude Code with --dangerously-skip-permissions — a flag that grants the agent unrestricted access to the environment. The agent receives a prompt Carlini didn’t write (Claude wrote the vulnerability-finding agent), pointed at a Docker container with a target codebase compiled with ASan. The instruction is effectively: “Please find a bug.” The model then autonomously reads source code, examines commit histories, identifies vulnerability patterns, constructs proof-of-concept exploits, and writes reports. No custom scaffolding. No specialized prompting. No task-specific harnesses.

For the Ghost CMS finding, Carlini wrote a single prompt — the same for all content management systems: “I would like you to audit the security of this codebase. This is a CMS. You have complete access to this Docker container. It is running. Please find a bug.”

The governance gap:

The agent has no identity within the system it audits. It has no authorization scope — --dangerously-skip-permissions explicitly removes all constraints. There is no governed boundary between observation (reading code) and action (executing code, sending packets, writing files). The agent produces no structured governance evidence — no audit trail of what it accessed, what it modified, what constraints were in force.

This works for a researcher at Anthropic doing defensive work in a sandboxed environment. It does not work for enterprise security operations, regulatory compliance, or any context where accountability matters.

What governance infrastructure is needed:

AC-6 requires that systems enforce the most restrictive set of rights needed for each task. IA-2 requires unique identification of users and processes. The NCCoE concept paper frames this as the non-human identity (NHI) problem. RSA’s analysis puts it directly: “Treat every agent like an identity.” But AC-6 was designed for human users with predictable access patterns. An AI agent that autonomously decides which tools to use at each step requires per-activity authorization enforcement — not session-level access control. In practice, this means:

Agent identity — the security agent is a named entity with a credential, not an anonymous process running under a shared account
Authorization scope per activity — read configuration, yes; modify configuration, never; attempt exploitation, only in designated sandbox environments with separate authorization
Governed system access — each tool, API, or system the agent touches is typed, auditable, and constrained to the authorization scope for that specific step
Audit trail as governance evidence — what the agent accessed, what it found, what constraints were in force, what it was structurally excluded from doing

The contrast is concrete. We designed a governed security compliance audit in which a Claude agent performs a comparable task — assessing security configuration in an identity management console — with every element above in place: read-only boundary constraints structurally enforced at every step (the agent cannot click Edit, Modify, or Delete because those actions are outside its authorized scope), deterministic classification via decision tables (not LLM inference on compliance determinations), and a governance audit trail. The agent’s capability is the same. The governance infrastructure is the difference between --dangerously-skip-permissions and accountable security operations.

2. The Deterministic/Probabilistic Conflation

Security mandate: NIST SP 800-53 AU-3 (Content of Audit Records), AU-6 (Audit Record Review, Analysis, and Reporting)

What Carlini described:

When the model finds a vulnerability, it writes a report and assigns a CVSS score. Carlini acknowledges these scores are “fake” — useful for sorting and filtering, not for governance. He also runs a critique agent to validate findings, but says: “I’m still very paranoid the model is just going to lie to me. I don’t want to be the person generating AI slop.” He manually verifies every finding before filing.

For the OSS-Fuzz work, the ASan crash oracle provides deterministic verification — if ASan triggers, there’s a real bug. For the CMS work (Ghost, etc.), there is no oracle. The model finds, the model critiques, and then a human validates.

The governance gap:

Two fundamentally different types of decisions are running through the same inference machinery. Whether a crash triggers ASan (deterministic — binary oracle) is structurally different from whether a SQL injection is exploitable (judgment — requires contextual reasoning). Whether a finding is severity 9 or severity 5 (governance classification) is structurally different from whether a code path is reachable (technical analysis).

Any security practitioner recognizes this distinction intuitively. In compliance work, it’s the difference between a policy lookup (“does this configuration meet the requirement?”) and a risk assessment (“how likely is exploitation in this environment?”). The first should be deterministic. The second genuinely benefits from judgment. Current AI security workflows make no structural distinction between these decision types.

What governance infrastructure is needed:

DMN 1.0 — the OMG standard for decision modeling — provides exactly this separation. Decision tables with defined hit policies encode governance logic as deterministic rules:

UNIQUE hit policy — exactly one rule matches for any input. MFA compliance classification: 9 rules, zero ambiguity, zero LLM judgment on compliance determinations. The agent observes configuration (LLM capability — observation and interpretation). The decision table classifies the finding (deterministic governance logic — reproducible, auditable, ungameable).
FIRST hit policy — priority-ordered evaluation for escalation routing. Critical findings route to the CISO. Indeterminate findings escalate to a human auditor. The routing logic is a governance decision encoded in a decision table, not an LLM inference.

The principle: let the LLM do what LLMs are good at (observation, analysis, hypothesis generation). Let deterministic governance logic do what it’s good at (classification, routing, compliance determination). Separate the two structurally, not by prompt instruction.

This maps directly to the NCCoE concept paper’s concern about authorization: which agent actions should be autonomous versus which require deterministic governance logic?

3. The Patching Asymmetry

Security mandate: NIST SP 800-53 SI-2 (Flaw Remediation); NIST SSDF SP 800-218

What Carlini described:

Carlini’s team found 122 crashing inputs in Firefox — all confirmed real bugs by Mozilla, 22 assigned CVEs. One person, working for approximately two weeks on harnessing, produced more validated security findings than most dedicated security teams produce in a year.

But patching is harder. Developers spend almost as much time reviewing AI-generated patches as writing fixes themselves. Patches get rejected because the model puts the fix in a reasonable but wrong location. Developers have to ensure patches don’t break functionality. Anthropic, DeepMind (CodeMender), and OpenAI (Aardvark) are all building patching tools, but Carlini is direct: “It’s so much harder to do.”

The governance gap:

Finding vulnerabilities is mechanism work — it scales directly with model capability and compute. More capable models find more bugs with less scaffolding. This is the “bitter lesson” that both the security community and the broader AI community are learning.

Patching vulnerabilities is governance work — it requires understanding context, maintaining functionality, ensuring accountability, preserving code quality, and coordinating with maintainers. The asymmetry is structural: offense scales with capability, defense scales with process infrastructure.

If the security community responds to AI-found vulnerabilities with AI-generated patches and no process governance for the patching workflow, the result is a flood of plausible-looking patches that developers can’t trust, can’t efficiently review, and can’t safely merge.

What governance infrastructure is needed:

Every enterprise that has done process improvement recognizes this problem — it’s a workflow governance challenge:

Responsibility assignment (RACI) — who reviews a patch, who approves it, who merges it, who is accountable for the outcome. Not ad-hoc; structurally defined per patch severity.
Governed decomposition — the patch process as a governed subprocess: receive vulnerability → develop patch → test → review → approve → merge. Each step carries its own acceptance criteria.
Structured exception handling — what happens when a patch breaks tests? When it conflicts with another patch? When a maintainer rejects it? Typed exceptions with defined responses, not generic “try again.”
Audit trail — which vulnerability was patched, by what logic, verified by whom, with what evidence of functional preservation.

This is not about making the model patch better. It’s about governing the patching process so that AI-generated patches are triaged, reviewed, tested, and merged through accountable workflow infrastructure rather than ad-hoc human judgment at unsustainable volume.

4. The Bug Bounty Volume Crisis

Security mandate: NIST SP 800-53 IR-5 (Incident Monitoring), IR-6 (Incident Reporting); NIST CSF DE.AE (Anomalies and Events)

What Carlini described:

Google Chrome’s David Adrian reported that relative to February 2024, 2025 had 5x the bug bounty submissions. March 2026 — only 19 days in at the time of the interview — already had over double February’s total. Firefox saw the same pattern: even excluding Anthropic’s batch of findings, the month was the biggest in two years.

The volume problem is compounded by quality uncertainty. Previously, the formatting and completeness of a bug report was a reasonable proxy for whether a submission was worth investigating. AI-generated reports eliminate this signal — a model produces perfectly formatted reports regardless of whether the finding is real. Ptacek observed that projects like curl will shut down bug bounties entirely, and large programs will become much more strict about what they accept.

The governance gap:

Bug bounty programs are governance infrastructure operating at a scale they were not designed for. Triage is human-dependent. Severity assessment is judgment-dependent. Routing is queue-dependent. None of these scale with AI-generated volume.

What governance infrastructure is needed:

This is a triage workflow problem with well-established solutions in process governance:

Deterministic initial classification (DMN decision tables) — not replacing human judgment, but filtering the volume so humans review only what passes structured triage. Is this a known vulnerability class? Does it match a pattern that’s already been reported? Is there a reproducible proof-of-concept? These are deterministic checks.
Structured triage workflow (BPMN process model) — receive → classify (deterministic) → validate (automated reproduction attempt) → escalate (by severity, by component, by novelty). Each step has defined inputs, outputs, and routing logic.
Submitter trust calibration — a submitter with a history of validated findings gets different routing than a first-time submitter. Evidence-based, not reputation-based.
Exception handling — the cases that don’t fit classification rules route to human judgment through a defined escalation path, not through queue position.
Audit trail — every finding’s path through triage is recorded, attributable, and reviewable.

Bug bounty programs that implement this kind of triage governance will survive the AI-generated volume. Those that depend on human judgment for every submission will not.

5. The Misconfiguration Threat

Security mandate: NIST SP 800-53 CM-6 (Configuration Settings), CM-8 (Component Inventory); CIS Control 4 (Secure Configuration)

What Carlini described:

Carlini identified what he considers the most dangerous near-term threat — not novel zero-day vulnerabilities, but misconfiguration exploitation: “In practice, most exploits are just someone forgot to patch their service or misconfigured something and left some port open. I’m very worried about the ability of models to find and exploit those.”

He can’t measure this threat because he can’t ethically scan random internet services. But he’s clear: the barrier to finding and exploiting misconfigurations is about to collapse. Models don’t need novel vulnerability research capability for this — they just need to look.

The governance gap:

Misconfiguration detection is the most governable form of security assessment — and the most dangerous if ungoverned. A model scanning for open ports, unpatched services, and misconfigured access controls needs clear authorization scope, deterministic classification, boundary constraints, and audit trails.

Without this infrastructure, the same capability that allows defensive misconfiguration scanning allows offensive exploitation. The dual-use concern that Carlini raises — and that the NCCoE concept paper centers on — is a governance problem, not a capability problem.

What governance infrastructure is needed:

This is the use case where the full governance stack is most concrete:

Authorization scope — the agent may scan these systems and only these systems. Read configuration, yes. Attempt exploitation, never. The boundary is structural — not a prompt instruction that can be overridden, but an architectural constraint that excludes prohibited actions.
Deterministic compliance classification — “this MFA configuration is non-compliant” is a deterministic determination, not an LLM judgment. Decision tables with defined rules, matched against organizational policy. The agent observes. The governance logic classifies.
Escalation routing — critical findings to the CISO immediately. Indeterminate findings to a human auditor with evidence. Priority ordering by governance logic, not LLM inference.
Audit trail — what was scanned, what was found, by what logic it was classified, what the agent was authorized to do and what it was structurally excluded from doing.

The difference between a governed misconfiguration scan and an ungoverned one is not capability — the model is identical. The difference is whether the agent operates as an identifiable, authorized, bounded, auditable entity within enterprise governance, or as --dangerously-skip-permissions.

6. The Measurement Gap

Security mandate: NIST SP 800-53 CA-7 (Continuous Monitoring), RA-3 (Risk Assessment); NIST AI RMF MEASURE function

What Carlini described:

Carlini repeatedly expressed the desire for scientific measurement of capability curves — how bug-finding rates change with model capability, how exploitation capability is evolving, where the plateaus are. He doesn’t have these measurements: “I wish I knew the answer. This would be great to know, but yeah, not yet.”

Each new model generation expands the attack surface: “Each time the models get better, the space of attacks grows again.” Even if you could exhaust all bugs findable by one model, the next model finds a new class.

The governance gap:

Organizations deploying AI security agents lack continuous assessment infrastructure for agent capability evolution. They measure model benchmarks (static, point-in-time) but not operational alignment (dynamic, ongoing). When a model upgrade changes what the agent can find, the agent’s authorized scope may no longer match its actual capability.

What governance infrastructure is needed:

Continuous alignment assessment — is the agent’s operational behavior still aligned with its authorized scope? Not a one-time evaluation, but ongoing monitoring. If the agent starts finding vulnerability classes it wasn’t authorized to investigate, that’s a governance event requiring escalation.
Evidence-based autonomy — a security agent that demonstrates reliable classification and appropriate escalation over time earns wider authorized scope. A model upgrade that changes capability requires re-assessment before scope expansion. Autonomy is earned through evidence, not assumed from capability.
Capability drift detection — when a model upgrade changes what the agent can do, governance infrastructure should detect this and trigger re-authorization. The agent’s scope should be re-calibrated to its actual capability, not left at the prior authorization level.

This is analogous to how organizations manage human security assessors — certifications are re-evaluated, scope is explicitly authorized per engagement, and capability changes (new tools, new techniques) trigger scope review. AI security agents need the same lifecycle governance.

7. The “Smashing the Stack” Moment

Security mandate: NIST CSF 2.0 Govern function; ISO 42001 (AI Management Systems); NIST AI RMF

What Carlini described:

Carlini compared the current moment to the discovery of stack smashing: “We are fundamentally entering a new world on the order of having recently discovered that you can now smash the stack.” He referenced the 2002-2004 wave of worms — software built without security as a first-class concern was widely deployed, then the understanding of how to exploit it was broadly distributed.

He warned: “I’m worried that we’ll have that world again where people develop software in a world that did not think about security as a first-class object. And then it turns out we widely distributed the understanding of how to exploit these things.”

The governance gap:

The parallel is precise. Organizations are deploying AI agents — security agents, coding agents, research agents, customer-facing agents — without governance infrastructure. When the exploitation capability arrives (and Carlini is clear it’s arriving), agents operating without governance will be the equivalent of pre-security-era software: capable, deployed, and fundamentally ungoverned.

What governance infrastructure is needed:

The post-stack-smashing era produced DEP, ASLR, stack canaries, and secure development lifecycles. The post-AI-security era needs the governance equivalent:

Identity infrastructure for agents — agents as named, credentialed entities within enterprise identity systems (the NCCoE concept paper’s core concern)
Authorization frameworks for agent actions — per-step authorization scope, structurally enforced, not advisory
Deterministic decision separation — compliance determinations, escalation routing, and boundary enforcement via decision tables, not LLM inference
Process governance for agent workflows — responsibility assignment, structured exception handling, governed decomposition, controlled vocabularies
Audit infrastructure — governance-quality evidence of what agents did, by what logic, under what constraints, with what outcomes

The standards exist. BPMN 2.0, DMN 1.0, RACI, SIPOC, ISO 31000 — these are established, operational, and validated at enterprise scale. They have simply never been applied to AI agent execution. Two open specifications formalize this application: the Intent Stack for governance context and the BPM/Agent Stack for execution governance. But the patterns themselves predate both specifications by decades. The question is not whether the governance infrastructure should be built. It’s whether it’s built before or after the exploitation wave.

Summary: Seven Gaps, Seven Governance Requirements

#	Gap	Carlini Evidence	What’s Needed
1	Ungoverned Agent	`--dangerously-skip-permissions`	Agent identity, authorization scope, governed system access, audit trail
2	Decision Conflation	“CVSS scores are fake”	Deterministic/probabilistic separation (DMN decision tables)
3	Patching Asymmetry	“Harder to patch than to find”	Governed patching workflow (RACI, structured exceptions, audit trail)
4	Volume Crisis	Chrome 5x submissions	Deterministic triage classification, structured escalation routing
5	Misconfiguration Threat	“Most exploits are misconfigs”	Governed scanning (authorized scope, boundary constraints, deterministic classification)
6	Measurement Gap	“I wish I knew”	Continuous alignment assessment, evidence-based autonomy, capability drift detection
7	Stack-Smashing Moment	“Fundamentally new world”	Full governance stack: identity, authorization, decision separation, process governance, audit

Every requirement in the table above has established standards behind it. The gap is not in the standards — it’s in their application to AI agent execution.

For NIST NCCoE Reviewers

The NCCoE concept paper on Software and AI Agent Identity and Authorization focuses on four concerns: identification, authorization, access delegation, and logging/transparency. Carlini’s research provides real-world evidence for why all four are urgent:

Identification — Carlini’s agents have no identity. The --dangerously-skip-permissions pattern is the default in security research. Enterprise security operations need the opposite.
Authorization — the difference between a defensive audit and offensive exploitation is authorization scope. The agent’s capability is the same in both cases. Governance makes the distinction.
Access delegation — when a security agent delegates to sub-agents (Carlini’s critique agent model), the delegation boundary needs governed authorization, not inherited permissions.
Logging/transparency — Carlini manually verifies every finding because there’s no governance-quality audit trail. Enterprise scale requires structured evidence, not researcher diligence.

The two published specifications referenced in this analysis — Intent Stack and BPM/Agent Stack — provide the formal governance architecture for all four NCCoE concerns, grounded in established standards (BPMN 2.0, DMN 1.0, RACI, ISO 31000). Both are CC BY 4.0.

A true fact about the world

Carlini, in that same interview, puts it plainly:

“It was nice when we didn’t know about Spectre. That was just a nice world to live in. But too bad — it turns out a true fact about the world is you can do side channels. Same thing with ROP. Wouldn’t it be nice if write-or-execute was just the perfect solution? But it’s not. Here is a true fact about the world: you can do this thing. Now we have this other thing that is true in the world. We have these language models that can find these bugs and potentially soon exploit them. And maybe we wish that we didn’t have these things, but they exist, and we should measure the capabilities that they have so that we aren’t blind to what happens.”

Every uncomfortable discovery in that lineage — Spectre, ROP, stack smashing — produced structural countermeasures. The governance gap is the next true fact about the world. The standards mandate the controls. The implementation gap needs structural infrastructure. Build it before it’s needed, not after.

References

Carlini, N., Lucas, K., Ben Asher, E., Cheng, N., Lakhani, H., Forsythe, D., and Guru, K. “0-Days.” Anthropic Frontier Red Team, February 5, 2026. red.anthropic.com/2026/zero-days/
Carlini, N. Interview with David Adrian, Deirdre Connolly, and Thomas Ptacek. “AI Finds Vulns You Can’t With Nicholas Carlini.” Security, Cryptography, Whatever podcast, March 25, 2026. securitycryptographywhatever.com | YouTube
NIST NCCoE. “Accelerating the Adoption of Software and AI Agent Identity and Authorization.” Concept Paper, February 5, 2026. Comments due April 2, 2026. nccoe.nist.gov | AI-Identity@nist.gov
Jones, N.B. “Claude Mythos Changes Everything. Your AI Stack Isn’t Ready.” March 31, 2026. YouTube
RSA. “Claude Mythos and Capybara: Best Practices for The Next Evolution in AI-Powered Cybersecurity Risks.” March 29, 2026. rsa.com
Fortune. “Anthropic says testing ‘Mythos,’ powerful new AI model after data leak reveals its existence.” March 26, 2026. fortune.com
Kline, R. Intent Stack Governance Architecture Specification, v1.2. April 1, 2026. CC BY 4.0. intentstack.org
Kline, R. BPM/Agent Stack Specification, v1.1. April 1, 2026. CC BY 4.0. bpmstack.org

What Carlini's Zero-Day Research Reveals About the AI Security Governance Gap

Purpose

1. The Ungoverned Agent

2. The Deterministic/Probabilistic Conflation

3. The Patching Asymmetry

4. The Bug Bounty Volume Crisis

5. The Misconfiguration Threat

6. The Measurement Gap

7. The “Smashing the Stack” Moment

Summary: Seven Gaps, Seven Governance Requirements

For NIST NCCoE Reviewers

A true fact about the world

References

The Execution Governance Gap in AI Agent Deployment

BPM's Missing Application: Why BPMN 2.0 and DMN 1.0 Are the Answer to AI Agent Governance

When AI Models Get More Capable, Governance Gets More Important — Not Less