Findings

Repeat-reply vulnerability

Updated: June 19, 2025

Description

Severity: High

The AI model can be exploited using a repeat-reply attack, where it is prompted to repeat specific strings indefinitely.

This behavior can inadvertently cause the model to leak sensitive data, including past responses, system instructions, or private information embedded in training data.

If an attacker successfully triggers a repeat-reply attack, the AI may expose unintended data by continuously iterating over past responses. This could lead to data leakage, security breaches, or even the exposure of proprietary or confidential model behaviors.

Remediation

Investigate and strengthen guardrails to detect and prevent repeat-reply attacks. Implement output length restrictions, loop detection mechanisms, and rate limiting to stop infinite repetitions. Regular audits should also be conducted to ensure the model does not inadvertently reveal unintended data when subjected to such attacks.

Security Frameworks

Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.

Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.

Previous (Findings - Action based findings)
Prone to toxic content generation
Next (Findings - Action based findings)
Replay vulnerability