Tools: When AI Agents Trust Each Other: The Multi-agent Security Problem...
If you've been following AI agent security at all, you already know the baseline is grim. At Black Hat USA 2025, Zenity Labs demonstrated working exploits against Microsoft Copilot, ChatGPT, Salesforce Einstein, and Google Gemini — in the same session. One demo showed a crafted email triggering ChatGPT to hand over access to a connected Google Drive. Copilot Studio was leaking CRM databases.
Then came CVE-2025-32711 (dubbed "EchoLeak") — a CVSS 9.3 vulnerability in Microsoft 365 Copilot where receiving a single crafted email triggered automatic data exfiltration. No user clicks required. The email arrives, the agent processes it, and your data leaves.
In November 2025, Anthropic confirmed that a Chinese state-sponsored group had weaponised Claude Code to target roughly 30 organisations across tech, finance, chemical manufacturing, and government. What made it unprecedented: 80-90% of tactical operations were executed by the AI agents themselves with minimal human involvement.
Bruce Schneier summarised the situation bluntly: "We have zero agentic AI systems that are secure against these attacks."
These are all single-agent problems. One agent, one set of tools, one attack surface. Difficult, but at least conceptually bounded. You know what you're defending.
The shift from single agents to multi-agent orchestration isn't just a scaling problem — it's a category change in the nature of the vulnerability.
Deloitte reports that 23% of companies are already using AI agents moderately, projecting 74% adoption by 2028. As organisations scale their deployments, they're naturally moving from "one agent does a task" to "multiple agents collaborate on complex workflows." Research agents feed into analysis agents, which feed into action agents. Reasonable architecture. Catastrophic security implications.
The core problem is devastatingly simple: agents trust each other by default.
When your researcher agent passes output to your writer agent, the writer treats that output as a legitimate instruction. There's no verification. No cryptographic signing. No provenance checking. Agent A's output is literally Agent B's input — and in the world of language models, there is no reliable distinction between "data to process" and "instruction to follow."
This means that if you compromise Agent A, you automatically get Agent B, Agent C, and whatever databases or APIs they have access to. You don't need to attack each agent individually. You need one entry point, and the t
Source: Dev.to