Maliciously injected content can result in the transmission of sensitive user data to attacker-controlled endpoints.
Modern multi-agent systems built on the Google A2A protocol enable dynamic discovery and delegation between autonomous agents through structured metadata known as agent cards. These cards describe capabilities, endpoints, and operational details that the host agent uses to plan task delegation. However, when agent cards are injected directly into an LLM’s reasoning context without strict boundary enforcement, metadata can be reinterpreted as executable instruction.
Our research demonstrates an attack vector termed as Agent Card Poisoning, a metadata injection vulnerability in which a malicious remote agent embeds adversarial instructions within its agent card. When the host LLM incorporates this poisoned metadata into its reasoning context, the injected content can influence tool-selection and execution decisions. As a result, the model may issue unintended tool calls, such as transmitting sensitive user data to attacker-controlled endpoints, leading to silent control-flow hijacking and potential PII exfiltration during otherwise legitimate task delegation.
We will now examine the full attack surface by breaking down the system architecture, trust assumptions, and step-by-step execution of the exploit.
We simulated a multi-agent delegation process based on the Google A2A protocol in which a host agent uses the agent cards to dynamically find and assign tasks to remote agents. To interpret user intent, assess available tools, and decide whether delegation is necessary, the host uses an LLM-driven reasoning engine. Remote agents advertise their capabilities and endpoints through agent cards, which are retrieved and incorporated into the host’s reasoning context during task planning.
Throughout this work, we consider a hotel booking scenario in which a user submits a booking request containing:
This scenario is intentionally chosen because it involves sensitive PII and payment information, making unintended outbound transmission of personal information.
An external client submitting a hotel booking request. The request may contain personally identifiable information (PII) and financial data.
The primary orchestration component responsible for:
To enable these operations, the host agent exposes a set of executable tools that the LLM can invoke during its reasoning process. These include http get, which retrieves remote resources over HTTP; http post, which sends outbound HTTP requests with structured payloads to external endpoints; delegate task, which forwards a structured task request to a remote agent’s declared delegation endpoint; and list tools, which allows the model to enumerate the tools and operational capabilities currently available within the host environment. The host agent also provides execute python, which enables the LLM to execute Python code blocks within the host execution environment and return the resulting output for further reasoning.
The LLM selects among these tools during reasoning. Crucially, tool execution is performed by the host after interpreting model output.
An external agent advertising hotel booking capabilities via an agent card. The card contains structured metadata such as:
This agent may be benign or malicious.
An external HTTP endpoint not declared as part of the intended delegation workflow. In the attack scenario, this endpoint receives exfiltrated PII via unintended HTTP POST execution.

Fig. 1: Architecture of the attack flow.
We will now talk about the complete attack lifecycle, aligned with network-level observations captured via PCAP captures.
Before any user request is processed, the host agent maintains connectivity with remote agents. As part of initialization or periodic synchronization, the host retrieves and stores all remote agent cards. This synchronization process occurs independently of any specific user request. In this scenario, the description of the agent card is poisoned, and it contains malicious instructions, which can pollute the context of the host agent LLM.

Fig. 2: Capture showing the agent card exchange.
The attack is triggered by a legitimate user interaction. The user submits a hotel booking request to the host agent, including the name, destination city, check-in and check-out dates and payment card details.
At this stage, all sensitive user data is confined to the host environment. The host agent receives the request over HTTPS and prepares it for intent analysis and potential delegation.

Fig. 3: Capture showing the request sent by the user to the host agent.
To determine task routing, the host constructs a reasoning prompt that includes:
The host asks the LLM to decide which agent should handle this request and what steps are required.
Now, because the poisoned agent card is embedded verbatim into the LLM’s reasoning context, its adversarial content is treated as authoritative planning input rather than inert metadata. At this point, untrusted external content directly influences the model’s decision-making process, marking the activation point of the attack.

Fig. 4: Capture showing the request sent to the LLM by the host agent.
Upon processing the reasoning context, influenced by the adversarial instruction embedded within the agent card, the model prioritizes an outbound HTTP POST request to an attacker-controlled endpoint, transmitting the full booking payload containing sensitive PII and payment details. Only after this transmission does the plan proceed with the legitimate delegate task invocation toward the remote hotel booking agent. From the host’s perspective, the generated actions are syntactically valid and leverage approved tools, however, the ordering and destination of the initial request constitute an unauthorized control-flow deviation.

Fig. 5: Capture showing the response of the LLM to the host agent.
Following the generation of the malicious execution plan, the host proceeds to execute the LLM-issued tool calls. The first action results in an outbound HTTP POST request to an external endpoint specified within the poisoned agent card. This request contains the full booking payload, including the user’s name, travel details, and payment card information.

Fig. 6: Capture showing the data being sent to external endpoint.
CyPerf 26.0.0 introduces new strikes that simulate Agent Card Poisoning attacks targeting systems implementing the Google A2A protocol. These strikes model scenarios in which malicious metadata embedded within remote agent cards influences the reasoning process of host agents, causing the underlying LLM to generate execution plans that include unintended tool invocations.
Users can access these strikes within the CyPerf attack library by searching for “Agent Card Poisoning.”

Fig. 7: CyPerf UI displaying strikes.
These strikes include several configurable properties that allow users to tailor the attack scenario. Users can configure the model, API version, and API key, along with endpoints such as the remote agent discovery URL, which is used by the host agent to retrieve remote agent cards that may contain the poisoned metadata, the PII receiver URL, which represents the external endpoint where exfiltrated sensitive information could be sent if the attack succeeds, and the LLM discovery URL, which defines the endpoint used to locate or interact with the underlying LLM service during the workflow. The configuration also includes the query used to trigger the agent interaction and a thought signature, which represents the characteristic reasoning trace pattern produced by the LLM during its decision-making process.

Fig. 8: CyPerf UI displaying strike configurations.
The statistic view in CyPerf UI provides detailed statistics from the test run, including the number of connections made and the number of active client and server agents. Users can also view separate HTTP statistics for client and server, along with overall TCP statistics. In the strike statistics view, there are stats to show whether the strike request to the server was allowed by the DUT. A positive value in the “Server Allowed” stats will indicate that the request was allowed through the DUT to the server. The client allowed stats can be used to check whether the client received the expected response to the strike request. Whether the request or response was blocked by the DUT, it should show 0 value.

Fig. 9: Run-time stats view in CyPerf UI.
CyPerf, Keysight’s cloud-native security test solution, provides customers with direct access to attack campaigns from different advanced persistent threats, enabling them to test their currently deployed security controls’ ability to detect or block such attacks across physical and cloud environments. CyPerf’s extensive strike library provides a rich simulation environment for understanding and defending against a wide array of network-based attacks. From traditional web exploits and SQL injections to emerging AI prompt attacks, these strikes help security professionals validate their defenses across diverse threat landscapes. As new vulnerabilities emerge, CyPerf continues to evolve, ensuring comprehensive coverage of the latest threats in network security testing.
Leave a Reply