Events Blog

LLM and Agent Red Teaming: Attacks That Nobody Sees

unsplash / jeremy bishop

Anyone running an LLM application or agent in production in 2026 has a security problem they probably do not know about. Classic pentesters, SAST and DAST tools find classic weaknesses: SQL injection, XSS, unsafe deserialization. They do not find prompt injection, tool confusion or goal hijacking. A different methodology is required.

Three layers, three attack vectors

LLM-based systems have three layers, each with its own attack vectors. At the model layer: jailbreaks, adversarial inputs, safety-filter bypass. At the system layer: RAG injection, context poisoning, system-prompt leakage and data exfiltration via connected tools. At the agent layer: goal hijacking, tool confusion and memory poisoning.

This three-layer view is not academic. It is decisive for how red teaming is carried out. A test on the model layer alone misses half of the attack surface. A test on the agent layer alone misses the underlying model weaknesses that make the agent exploitable in the first place.

What AppSec tools cannot do

SAST tools analyze source code. The relevant code of an LLM application is the prompt, and that is not source code but natural language. DAST tools test HTTP endpoints. A prompt injection through an innocuous-looking user input produces a valid HTTP request. The tool reports nothing. The attack only shows up once the LLM response is interpreted.

This is exactly why specialized adversarial tests are required. They do not rely on pattern matching but on attacker creativity and structured test playbooks, extended model- and system-specific for every engagement.

A real scenario

In a recent assessment a customer-support agent was tested that had access to an internal knowledge base and to a ticket system. Model-layer tests showed that the underlying agent defended well against all common jailbreaks. At the system layer, however, a context-injection attack via a manipulated document in the knowledge base succeeded and made the agent incorporate instructions from the document into its next response template.

At the agent layer the finding escalated: the injected instructions were used against the ticket system and actually modified real tickets. Without the layered red teaming this path would have stayed invisible. With the red teaming it was documented, mitigated through concrete guardrails, and verified in a retest.

← Back to Blog