Findings

Replay vulnerability

Updated: June 19, 2025

Description

Severity: High

The AI model is vulnerable to the replay attack, where previous interactions or outputs can be reused in new contexts, potentially causing unintended data leakage or the generation of inappropriate responses.

This vulnerability occurs when the model inadvertently reuses prior responses that may have been part of a confidential or sensitive conversation.

Example Attack

If attackers successfully perform a replay attack, they may be able to extract sensitive or confidential information from previous outputs. This could lead to data leakage, violation of privacy policies, or unauthorized access to personal or sensitive data. Additionally, replayed content may be used to manipulate the AI into generating harmful or unethical responses.

Remediation

Investigate and enhance the effectiveness of guardrails and output security mechanisms to prevent the model from inadvertently reusing previous responses inappropriately. Implement stricter output controls, such as preventing the model from repeating or referencing previous interactions unless explicitly permitted. Regular audits should be conducted to ensure the model does not reuse potentially sensitive information across different sessions.

Security Frameworks

Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.

Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.

Previous (Findings - Action based findings)
Repeat-reply vulnerability
Next (Findings - Action based findings)
Response time limit exceeded