State of AI Security 2026 | Breaking Circuits

01 The Paradigm Shift

The Enterprise Is Delegating Autonomy to Systems It Cannot Fully Trust

We have crossed a threshold. Generative AI—the static, stateless chatbot you queried and forgot—is giving way to something categorically different: autonomous agents that perceive, plan, and act without human intervention.

These systems don't just answer questions. They invoke APIs, write and execute code, query databases, send emails, manage infrastructure, and maintain persistent memory across sessions. By 2028, industry projections indicate that 33% of all enterprise software applications will incorporate agentic capabilities—with roughly 15% of operational decisions made autonomously, without a human in the loop.

The promise is real. So is the exposure.

33%

Enterprise Apps Agentic by 2028

Autonomous agents embedded across the software stack, from finance to healthcare

15%

Decisions Made Without Human Review

Day-to-day operational choices executed end-to-end by machine reasoning

Malicious Input to Compromise

Reconnaissance, weaponization, and exploitation collapsed into a single crafted payload

Traditional security models were built to defend systems with deterministic execution paths. Agentic AI does not have deterministic execution paths. It has semantic instruction following—and that distinction is the root cause of an entirely new threat taxonomy.

"Security failures no longer manifest as inaccurate text generation. They materialize as unauthorized privilege escalation, data exfiltration, and the automated propagation of malicious payloads."

02 Attack Surface Anatomy

Five Components. Each One a Potential Compromise Vector.

Understanding agentic risk requires mapping the operational architecture. An autonomous agent is not a monolith—it's a distributed system executing a continuous sense-plan-act loop, with five distinct components that each carry their own exposure.

🧠

Reasoning Engine (LLM)

Central cognitive processor. Interprets intent, decomposes goals, evaluates outcomes. Vulnerable to semantic manipulation—no code exploit required.

💾

Memory Module

Short-term working memory and long-term vector databases (RAG). Persistent memory poisoning can silently corrupt agent behavior across sessions—indefinitely.

⚙️

Tool Execution Layer

APIs, file systems, code interpreters, databases. The bridge between probabilistic reasoning and real-world action. Highly exploitable via MCP vulnerabilities.

🔗

Orchestration Framework

Governs task delegation across multi-agent topologies. In hierarchical systems, a single compromised orchestrator grants control of all subordinate agents.

Topology Determines Blast Radius

Topology	Structure	Primary Use	Risk Profile
Single-Agent	One engine manages full sense-plan-act loop	Focused tasks, basic coding	MODERATE Total failure if hijacked
Hierarchical	Orchestrator delegates to specialized subordinates	Enterprise workflows, approvals	HIGH Leader compromise = full access
Federated	Peer agents collaborate without central control	Decentralized research, supply chain	CRITICAL Cascading failures, worm propagation

03 The Semantic Threat Landscape

Exploiting Cognitive Logic—No Shellcode Required

The defining characteristic of agentic attacks is that they require no binary exploitation. Because these systems are designed to follow natural language, adversaries can subvert operational logic through the same interface the system uses to receive legitimate instructions.

⚠️

The Lethal Trifecta

The dangerous intersection of three properties creates catastrophic risk: sensitive data access + continuous exposure to untrusted content + autonomous capability to communicate externally. When all three converge, a single crafted input becomes a complete intrusion.

Indirect Prompt Injection & The Confused Deputy

Unlike direct jailbreaks through a chat interface, indirect prompt injection occurs when an agent autonomously retrieves external data containing hidden adversarial instructions. The agent cannot distinguish between a developer's system prompt and a directive embedded in a malicious document, email, or web page.

The result is the "confused deputy" problem: the agent executes attacker-defined actions using its own fully authorized privileges. Two documented cases from 2024–2025 illustrate how operational this threat already is:

Slack AI / Aug 2024

Private Channel Data Exfiltration via Invisible Public Message

An invisible payload in a public channel hijacked Slack's AI assistant during summarization, causing it to query restricted private channels and exfiltrate sensitive content via a rendered URL—no user interaction required.

CRITICAL

CVE-2025-32711

"EchoLeak" — Microsoft 365 Copilot Data Exfiltration

A single crafted email triggered M365 Copilot to autonomously execute unauthorized data exfiltration routines without user interaction. Receiving the email was sufficient to trigger the exploit chain.

CRITICAL

Memory Poisoning & Long-Horizon Goal Hijacking

Where prompt injections affect a single session, memory poisoning creates durable, persistent compromise. Attackers inject malicious instructions through ordinary query interactions that corrupt the agent's vector database. Current systems treat stored embeddings as trusted facts—a phenomenon researchers call "Vector Haze."

🔬

MINJA Attack Framework Results

Achieving a 98%+ injection success rate and over 76% attack success rate under realistic deployment constraints—with zero direct write access to storage or model weights. The attack persists silently for days or weeks before triggering on an authorized user query.

In healthcare deployments, poisoned EHR agents have been demonstrated to silently swap patient identifiers—leading to misdiagnosis and contraindicated treatment recommendations. In financial contexts, poisoned due-diligence pipelines can steer investment advisory agents to recommend fraudulent equities across multiple clients over extended periods.

Agentic Worms & Multi-Agent Cascading Failures

In federated and hierarchical ecosystems, a compromised agent does not fail in isolation. Its corrupted outputs become the trusted inputs for downstream agents—enabling self-propagating "ZombAI" worms that leverage authorized tooling to harvest contacts and generate targeted phishing payloads across the corporate ecosystem.

🧨

Inter-Agent Trust Exploitation

Comprehensive evaluations show that up to 82.4% of leading models will execute a malicious payload when routed through a peer agent—even if they successfully refused the same payload from a human user. Trust escalation between agents is a first-class attack vector.

04 Infrastructure Fragility

The Foundation Has Cracks: MCP, Inference Engines, and the ShadowMQ Crisis

Semantic attacks exploit cognitive logic. But the physical execution layer beneath every agentic workflow—tensor libraries, serialization pipelines, communication protocols—runs on deeply legacy C/C++ codebases with no memory safety guarantees. The attack surface converges at the worst possible intersection.

The Model Context Protocol: Speed Over Security

The Model Context Protocol (MCP) has become the de facto standard for connecting LLMs to external systems—but its adoption velocity has far outpaced security hardening. Authentication is inconsistently applied. Authorization models grant blanket access rather than least-privilege. Encryption is frequently optional.

MCP servers execute with the same privileges as the client application hosting them. In early 2026, security researchers disclosed over 30 CVEs in MCP implementations within a 60-day window—many caused by foundational flaws like path traversal and argument injection in reference implementations that had been copy-pasted directly into production.

CVE-2026-26118

Azure MCP Server — SSRF Privilege Escalation

Server-side request forgery in the Azure MCP server allowed privilege escalation via an obtained managed identity token, potentially compromising the entire cloud environment from a single crafted input.

CRITICAL

MCP Audit 2025

43% of Remote MCP Servers Vulnerable to Command Injection

Independent audit found nearly half of analyzed public MCP servers allowed arbitrary system command execution via basic injection. Tool poisoning attacks manipulating mutable MCP schemas grant RCE on the host machine.

HIGH

The ShadowMQ Deserialization Crisis

In 2024–2025, virtually every major open-source inference engine was affected by a common pattern: unauthenticated ZeroMQ sockets bound to all network interfaces, passing data directly to Python's pickle deserializer. The result was remote code execution achievable by anyone who could reach the port.

Affected systems included Meta Llama Stack, vLLM, NVIDIA TensorRT-LLM, and Modular Max Server. Because the vulnerable pattern was copy-pasted across repositories, it propagated massively—allowing attackers who scanned for the ZMQ TCP banner to bypass all semantic guardrails and achieve direct GPU-level RCE.

Memory Corruption in Tensor Processing

CVE-2025-62164

vLLM — Malformed Tensor Out-of-Bounds Write via PyTorch 2.8.0

PyTorch 2.8.0 disabled sparse tensor integrity checks by default. vLLM's Completions API accepted precomputed embeddings, which when malformed triggered a massive out-of-bounds memory write in to_dense()—enabling RCE directly in the vLLM server process.

CRITICAL

CVE-2026-21869

llama.cpp — Negative n_discard Causes Deterministic Memory Corruption

Negative integer input to the n_discard parameter processed without validation causes reversed memory range arithmetic, deterministically corrupting the context window management and enabling RCE.

CRITICAL

CVE-2025-49847

llama.cpp — 64→32-bit Token Length Truncation in Vocabulary Loader

Casting a 64-bit token length to 32-bit bypasses length validation, allowing a malicious model file to overwrite critical memory regions during memcpy during the vocabulary-loading phase.

CRITICAL

"Placing a highly autonomous agent with deep infrastructural access into a standard Linux container provides a dangerous illusion of security."

Breaking Circuits Security Research, 2026

05 The Architectural Imperative

Hardware-Enforced Memory Safety: Security at the Silicon Level

The volume, severity, and recurrence of memory corruption in AI infrastructure demonstrates a fundamental truth: reactive software patching cannot keep pace with the attack surface. Migrating inference engines to Rust would help—but rewriting millions of lines of optimized C/C++ underpinning PyTorch, CUDA, and Linux-level drivers is economically infeasible.

The cybersecurity boundary must move down to the microprocessor. Hardware-enforced memory safety eliminates entire vulnerability classes at the architectural level—not the software level.

ARM MTE: Probabilistic Tagging

ARM's Memory Tagging Extension (ARMv9) implements a hardware lock-and-key model: 4-bit tags on 16-byte memory granules, matched against pointer metadata on every load and store. Tag mismatches trigger immediate exceptions—catching buffer overflows and use-after-free bugs with near-zero performance overhead and full binary compatibility with existing C/C++ code.

The limitation is statistical: only 16 possible tag values means sophisticated adversaries can attempt exhaustion or leverage speculative execution side-channels to leak tag assignments.

CHERI: Deterministic Capability Architecture

CHERI (Capability Hardware Enhanced RISC Instructions) takes a categorically different approach. Instead of tagging memory regions, it replaces every pointer in the architecture with an unforgeable 128-bit capability—containing the memory address, exact spatial bounds, permissions, and a hardware-managed 129th validity bit that software cannot directly manipulate.

🔒

CHERI in Practice: CVE-2025-62164 Neutralized

On a CHERI-enabled host, when an attacker submits a malformed tensor payload to vLLM's Completions API, the to_dense() call attempts an out-of-bounds write. The CHERI capability bounds check fires at the exact microsecond the write exceeds the allocated tensor bounds—deterministically halting exploitation at the silicon level, regardless of PyTorch's behavior.

CHERI also enables fine-grained software compartmentalization—down to individual MCP server functions—without context-switching penalties. A compromised MCP server cannot forge capabilities to pivot into the host OS or adjacent agent sessions. The confused deputy problem is solved in hardware.

ARM MTE

CHERI

Intel CET

Paradigm

Probabilistic 4-bit tag matching on 16-byte granules

Deterministic 128-bit capability bounds on every pointer

Shadow stack monitoring function return addresses

Guarantee

High mitigation; bypassed by tag exhaustion or side-channel

Mathematically provable eradication of spatial/temporal memory errors

Protects against ROP; does not stop buffer overflows

AI Fit

Near 100% binary compat with C/C++ inference code

Recompile required; enables hyper-granular compartmentalization

Prevents control flow hijacking even if overflow occurs

Trusted Execution Environments: Protecting Cognitive Guardrails

Hardware protects the execution layer—but autonomous agents also require tamper-proof cognitive governance. The Multilayer Agentic AI Security (MAAIS) framework mandates Alignment Critic agents that evaluate actions against operational and ethical policies before tool execution.

If those critics run in software, a sophisticated goal-hijacking injection can disable them. The solution: instantiate governance logic inside Intel TDX or AMD SEV Trusted Execution Environments—hardware-isolated enclaves where guardrail code is encrypted, invisible to the host OS, and verifiable via remote cryptographic attestation.

TEEs enable "proof-of-guardrail": a verifiable digital signature confirming the unmodified safety code is actively running before any system grants the agent access to sensitive interfaces. Combined with dynamic context scrubbing (Principle of Least Information), this framework starves injection attacks of the contextual data they require to formulate lateral movement queries.

06 The Bottom Line

Security Must Be Embedded in Silicon—Not Bolted On in Software

Agentic AI is not a future threat surface. It is the current one. The same architectural properties that make autonomous agents powerful—persistent memory, high-privilege tool access, continuous ingestion of external content—make them uniquely exploitable.

Conventional defenses have failed at every layer. Prompt filters are bypassed by indirect injection. Software sandboxes are escaped through kernel vulnerabilities. Inference engines carry memory corruption bugs that are re-introduced with every upstream dependency change. Multi-agent trust is exploited at an 82%+ success rate.

The path forward is not more software. It is deterministic, hardware-enforced memory safety: CHERI capabilities that make out-of-bounds writes architecturally impossible, MTE tagging that catches temporal violations in legacy codebases, and TEE-isolated governance that makes safety mechanisms cryptographically verifiable.

As AI advances from assistive generation to autonomous action, security guarantees must be embedded directly into the silicon. At Breaking Circuits, this is the architecture we build toward—deploying agentic AI for municipal infrastructure and critical systems where the cost of getting it wrong is not measured in data but in lives.

AI Security Agentic AI CHERI Critical Infrastructure Memory Safety MCP Vulnerabilities

Autonomous.Vulnerable.Undefended.

The Enterprise Is Delegating Autonomy to Systems It Cannot Fully Trust

Five Components. Each One a Potential Compromise Vector.

Topology Determines Blast Radius

Exploiting Cognitive Logic—No Shellcode Required

Indirect Prompt Injection & The Confused Deputy

Memory Poisoning & Long-Horizon Goal Hijacking

Agentic Worms & Multi-Agent Cascading Failures

The Foundation Has Cracks: MCP, Inference Engines, and the ShadowMQ Crisis

The Model Context Protocol: Speed Over Security

The ShadowMQ Deserialization Crisis

Memory Corruption in Tensor Processing

Hardware-Enforced Memory Safety: Security at the Silicon Level

ARM MTE: Probabilistic Tagging

CHERI: Deterministic Capability Architecture

Trusted Execution Environments: Protecting Cognitive Guardrails

Security Must Be Embedded in Silicon—Not Bolted On in Software

Autonomous.
Vulnerable.
Undefended.