Attacking the MCP Trust Boundary
Security Boulevard, Wednesday, April 22nd, 2026
The Model Context Protocol (MCP)'s inability to distinguish instructions from data enables prompt injection attacks that compromise AI agent security.
The Model Context Protocol (MCP) integrates AI agents with external services but inherits a critical vulnerability from language models: they cannot structurally distinguish code from data, making tool descriptions and data indistinguishable within the same context window.
Security researchers have demonstrated two classes of attacks - tool poisoning, where hidden instructions are embedded in tool descriptions, and toxic agent flows, where attackers seed data with instructions that models follow - affecting 5.5% of public MCP servers.
While the MCP specification recommends human approval loops and safe credential handling practices, these are non-enforceable recommendations rather than protocol constraints, and user behavior shows 93% approval rates for permission dialogs and widespread use of auto-approve mode.
The fundamental issue remains that transformers process instructions and data identically, creating an unsolved vulnerability class that affects applications like Claude Desktop, Cursor, and VS Code with Copilot.