Findings

XSS attack vulnerability

Updated: June 19, 2025

Description

Severity: High

The model can be made to include exfiltration code in its output, potentially leading to Cross-Site Scripting (XSS) attacks.

This vulnerability arises when the model generates output that includes malicious scripts or code, which could then be executed in the context of a user's browser. Attackers may exploit this flaw by crafting prompts that cause the model to output harmful code, which could be used for data exfiltration, website defacement, or the spreading of malware.

Example Attack

If exploited, this vulnerability could result in serious security breaches, including unauthorized access to sensitive data, session hijacking, or the injection of malicious scripts into trusted environments. The model's output could be manipulated to perform actions like stealing credentials, redirecting users to malicious sites, or executing harmful scripts. These attacks could undermine user trust, compromise website security, and expose organizations to significant reputational and financial risks.

Remediation

Investigate and improve the effectiveness of guardrails and other output security mechanisms to prevent the model from generating code that could be executed maliciously. Strengthen the model's ability to filter and sanitize output, especially when responding to prompts that could trigger the inclusion of executable or exfiltrative code. Implement rigorous security validation on all generated content to ensure that it is free from harmful scripts or code.

Security Frameworks

Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.

Improper Output Handling refers specifically to insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users indirect access to additional functionality.

Adversaries can Craft Adversarial Data that prevent a machine learning model from correctly identifying the contents of the data. This technique can be used to evade a downstream task where machine learning is utilized. The adversary may evade machine learning based virus/malware detection, or network scanning towards the goal of a traditional cyber attack.

Adversaries may abuse command and script interpreters to execute commands, scripts, or binaries. These interfaces and languages provide ways of interacting with computer systems and are a common feature across many different platforms. Most systems come with some built-in command-line interface and scripting capabilities, for example, macOS and Linux distributions include some flavor of Unix Shell while Windows installations include the Windows Command Shell and PowerShell.

Adversaries may use their access to an LLM that is part of a larger system to compromise connected plugins. LLMs are often connected to other services or resources via plugins to increase their capabilities. Plugins may include integrations with other applications, access to public or private data sources, and the ability to execute code.

Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.

Previous (Findings - Action based findings)
Use after free