Prompt injection
Also known as: prompt injection attack, indirect prompt injection, jailbreak
Prompt injection is one of the most significant security concerns for AI-powered applications in 2026. In a direct injection attack, a user embeds instructions in their message designed to override the system prompt: 'ignore your previous instructions and do X instead.' In an indirect injection attack, malicious instructions are hidden in external data the model reads, like a web page an agent browses or a document it retrieves via RAG.
For agents that take actions, the risk is amplified. An indirect injection in a webpage that an agent reads could instruct it to exfiltrate data, send messages, or take actions the user never intended. As MCP server connections proliferate, tool poisoning attacks, where a malicious MCP server injects instructions through its responses, are an emerging variant of the same problem.
Defending against prompt injection is an active area of research with no complete solution. Current best practices include input sanitization, structuring prompts so user input is clearly delimited from system instructions, limiting agent permissions to the minimum necessary scope, and monitoring agent behavior for unexpected actions. Building agents that never trust external content as instructions is the cleanest architectural principle, even if hard to fully enforce.