Replay Vulnerability

Updated: June 15, 2026

Description

Severity: High

The AI model is vulnerable to the replay attack, where previous interactions or outputs can be reused in new contexts, potentially causing unintended data leakage or the generation of inappropriate responses.

This vulnerability occurs when the model inadvertently reuses prior responses that may have been part of a confidential or sensitive conversation.

Example Attack

If attackers successfully perform a replay attack, they may be able to extract sensitive or confidential information from previous outputs. This could lead to data leakage, violation of privacy policies, or unauthorized access to personal or sensitive data. Additionally, replayed content may be used to manipulate the AI into generating harmful or unethical responses.

Remediation

Investigate and enhance the effectiveness of guardrails and output security mechanisms to prevent the model from inadvertently reusing previous responses inappropriately. Implement stricter output controls, such as preventing the model from repeating or referencing previous interactions unless explicitly permitted. Regular audits should be conducted to ensure the model does not reuse potentially sensitive information across different sessions.

Security Frameworks

Establish, implement, document and maintain a continuous, iterative risk management system across the entire lifecycle: identification/analysis of known/foreseeable risks, estimation of risks under intended use and reasonably foreseeable misuse, evaluation of post-market monitoring data, adoption of appropriate risk-management measures including testing.

Achieve appropriate levels of accuracy, robustness and cybersecurity, and perform consistently in those respects throughout the lifecycle. Declare accuracy levels and relevant metrics in instructions for use. Implement technical/organisational measures against errors, faults, inconsistencies, feedback loops (in continuously learning systems), and adversarial attacks such as data/model poisoning, adversarial examples, model evasion, confidentiality attacks and model flaws.

Technically allow for the automatic recording of events ('logs') over the lifetime of the system to ensure traceability of functioning appropriate to the intended purpose; for Annex III(1)(a) systems, logs must include the period of use, reference database, input data leading to a match, and identification of natural persons involved in result verification.

Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.

Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.

AI system security and resilience - as identified in the MAP function - are evaluated and documented.

Privacy risk of the AI system - as identified in the MAP function - is examined and documented.

Post-deployment AI system monitoring plans are implemented, including mechanisms for capturing and evaluating input from users and other relevant AI actors, appeal and override, decommissioning, incident response, recovery, and change management.

The organization shall define and document verification and validation measures for the AI system and specify criteria for their use.

The organization shall define and document the necessary elements for the ongoing operation of the AI system. At the minimum, this should include system and performance monitoring, repairs, updates and support.

The organization shall ensure that the AI system is used according to the intended uses of the AI system and its accompanying documentation.

Attackers can manipulate an agent's objectives, task selection, or decision pathways through prompt-based manipulation, deceptive tool outputs, malicious artefacts, forged agent-to-agent messages, or poisoned external data.

Adversaries corrupt or seed agent context with malicious or misleading data, causing future reasoning, planning, or tool use to become biased, unsafe, or aid exfiltration.