Findings
Repeat-reply vulnerability
Updated: June 19, 2025
Description
The AI model can be exploited using a repeat-reply attack, where it is prompted to repeat specific strings indefinitely.
This behavior can inadvertently cause the model to leak sensitive data, including past responses, system instructions, or private information embedded in training data.
If an attacker successfully triggers a repeat-reply attack, the AI may expose unintended data by continuously iterating over past responses. This could lead to data leakage, security breaches, or even the exposure of proprietary or confidential model behaviors.
Remediation
Investigate and strengthen guardrails to detect and prevent repeat-reply attacks. Implement output length restrictions, loop detection mechanisms, and rate limiting to stop infinite repetitions. Regular audits should also be conducted to ensure the model does not inadvertently reveal unintended data when subjected to such attacks.
Security Frameworks
Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.
Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.