Events Blog

LLM and Agent Red Teaming: The Invisible Attack Surface in the Boardroom View

unsplash / pawel czerwinski

Corporations with more than 1,000 employees operate on average more than a dozen LLM applications and agents in production-grade environments in 2026. What appears as Generative AI on strategy slides translates operationally into productive systems with access to internal knowledge bases, customer data and executing tools. The security function reviews these systems with the same methods that worked for classic web applications. The result is a gap that must become visible in the current risk profile. Anyone accountable should understand why this gap emerged, why it is widening and how the board can structurally protect the company without slowing operations.

Three layers, three risk classes

LLM systems consist of three layers with distinct attack profiles. At the model layer: jailbreaks, adversarial inputs and safety-filter bypass. At the system layer: context poisoning in RAG pipelines, leakage of internal prompts and data exfiltration through connected tools. At the agent layer: goal hijacking, tool confusion and memory poisoning. Each layer creates its own damage scenarios: regulatory breaches, reputational harm, data loss, flawed business decisions. Anyone looking at only one layer misses two thirds of the relevant attack surface and therefore fails the duty of care that comes with running productive AI in a regulated environment.

This three-layer view is not a technical detail but a governance question. It determines who in the organization is responsible for which class of findings, how escalation paths are designed and which insurance and compliance implications arise. In regulated industries such as finance, insurance and pharma, supervisors increasingly require documented evidence that all three layers have been tested. Anyone unable to produce this evidence risks not only the incident itself but the audit finding that follows. Red teaming thereby moves from a technical task to a requirement on the delivery capability of the entire AI organization. In tender processes and audits this is increasingly a knock-out criterion rather than a bonus point.

Why classic AppSec tools fail

SAST tools analyze source code. The truly relevant code of an LLM application is the prompt, and the prompt is not a programming language but natural language. DAST tools probe HTTP endpoints. A prompt injection through a harmless-looking free-text field produces a perfectly valid HTTP request that only becomes a security incident inside the model response. Pentest providers without LLM expertise typically report a clean finding in such tests. The board receives a green audit while the actual weakness remains undetected in production. This systematic blindness is the main reason why specialized adversarial methods are no longer optional but mandatory.

LLM red teaming requires a different toolkit and a different talent profile. Instead of pattern matching against known CVE classes, testers work with structured playbooks extended specifically for each model and system. They combine attacker creativity with automation via adversarial frameworks and document every finding reproducibly. Effort per engagement is higher than for a classic pentest, and so is the insight gained. Anyone refusing this budget saves in the wrong place: the cost of a successful attack on a productive customer agent regularly exceeds the cost of a full red teaming engagement by two orders of magnitude. The economics of the decision are clear.

A real scenario from our practice

In a recent engagement we tested a customer-support agent that had access to an internal knowledge base and a ticketing system. At the model layer the agent withstood all common jailbreaks. At the system layer, however, an attack succeeded through a manipulated document in the knowledge base: the agent incorporated instructions from that document into its next response template. Those instructions were then executed against the ticketing system at the agent layer and modified real tickets. Without the layered approach the attack path would have stayed invisible. The damage would only have materialized weeks later in customer complaints and compliance findings.

With red teaming, the finding was fully documented, mitigated through specific guardrails and verified in a retest. What mattered was not the individual finding but the structured methodology that locates such paths reproducibly instead of leaving them to chance. This reproducibility is what makes a finding defensible in front of auditors and supervisors. It provides the board with evidence that the system was not only tested once, but tested under a methodology that can be reapplied with every model update. Exactly this repeatability separates red teaming from a one-off security test in real governance terms.

Governance and steering in the group

Red teaming is not a one-time activity but part of the AI lifecycle. Before go-live of every application, a full assessment across all three layers. After every significant model or tooling change, a targeted retest. Quarterly, a sampling audit on productive systems. This cadence can be embedded into an existing ISMS and gives the CISO a robust data basis for risk reporting to the board and the supervisory committee. Companies that establish this rhythm significantly reduce their incident probability and at the same time improve their negotiating position with cyber insurers and regulators alike.

Steering requires clear roles. The CISO owns the framework and the cadence. The AI accountable parties provide system documentation and access to test environments. An external or hybrid red teaming function brings the specialized attacker knowledge that is rarely available internally at the required depth. Findings land in a central register with owner, severity and remediation deadline. Anyone failing to build this steering loses oversight by the fifth productive agent at the latest. Anyone building it early creates the condition for running AI in the group at scale and at audit grade simultaneously. In regulated industries this posture is increasingly an exclusion criterion in the selection of consulting and platform partners.

Conclusion and Recommendation

LLM and agent security is not a specialist IT discipline but a board topic. The attack surface spans the model, system and agent layers simultaneously, classic AppSec tools do not capture it, and the damage of a successful attack hits brand, balance sheet and regulatory status alike. Recommendation for the next 90 days: identify three productive or production-grade LLM applications, commission a layered red teaming engagement across all three layers, transfer findings with owner and remediation deadline into the existing risk register, and anchor a cadence of pre-go-live test, retest and sampling audit in AI governance. With this, the risk becomes visible, controllable and defensible in front of the supervisory committee.

ECODYNAMICS runs LLM and agent red teaming engagements in this methodology, from the initial assessment to establishing a permanent steering function in your organization. Our teams combine senior profiles from offensive security with experience from more than 65 productive AI projects. You receive a prioritized findings register, concrete guardrails and a reporting format that can be brought into board and audit committee without further translation. On request we start with a two-week compact assessment on a single application, whose output serves as the decision basis for a full program. Get in touch.

← Back to Blog