Replay Vulnerability
Updated: May 5, 2026
Description
The AI model is vulnerable to the replay attack, where previous interactions or outputs can be reused in new contexts, potentially causing unintended data leakage or the generation of inappropriate responses.
This vulnerability occurs when the model inadvertently reuses prior responses that may have been part of a confidential or sensitive conversation.
Example Attack
If attackers successfully perform a replay attack, they may be able to extract sensitive or confidential information from previous outputs. This could lead to data leakage, violation of privacy policies, or unauthorized access to personal or sensitive data. Additionally, replayed content may be used to manipulate the AI into generating harmful or unethical responses.
Remediation
Investigate and enhance the effectiveness of guardrails and output security mechanisms to prevent the model from inadvertently reusing previous responses inappropriately. Implement stricter output controls, such as preventing the model from repeating or referencing previous interactions unless explicitly permitted. Regular audits should be conducted to ensure the model does not reuse potentially sensitive information across different sessions.
Security Frameworks
Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.
Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.
AI system security and resilience - as identified in the MAP function - are evaluated and documented.
Privacy risk of the AI system - as identified in the MAP function - is examined and documented.
Post-deployment AI system monitoring plans are implemented, including mechanisms for capturing and evaluating input from users and other relevant AI actors, appeal and override, decommissioning, incident response, recovery, and change management.
The organization shall define and document verification and validation measures for the AI system and specify criteria for their use.
The organization shall define and document the necessary elements for the ongoing operation of the AI system. At the minimum, this should include system and performance monitoring, repairs, updates and support.
The organization shall ensure that the AI system is used according to the intended uses of the AI system and its accompanying documentation.
Attackers can manipulate an agent's objectives, task selection, or decision pathways through prompt-based manipulation, deceptive tool outputs, malicious artefacts, forged agent-to-agent messages, or poisoned external data.
Adversaries corrupt or seed agent context with malicious or misleading data, causing future reasoning, planning, or tool use to become biased, unsafe, or aid exfiltration.