Author: John Farley
Data poisoning is a type of adversarial attack on artificial intelligence (AI) systems in which malicious actors intentionally introduce corrupted, incomplete, incorrect or misleading data into the training datasets used to build machine learning models. This deliberate manipulation aims to compromise the model's performance, leading to flawed outputs and biased outcomes. These attacks can allow hackers to insert backdoors that can later be exploited. There are several types of attacks, including but not limited to:
- Targeted attacks that may inject malware or exploit specific vulnerabilities
- Backdoor poisoning that affects inferences
- Training data poisoning that subtly but deliberately alters datasets
Even minimal interference, such poisoning just 0.001% of training data, can degrade model accuracy by up to 30%1 and distort decision-making boundaries in critical applications.
Impacts on AI developers and deployers
Data poisoning poses significant risks both to AI developers who build and train models and deployers who use the models in production environments.
Risks for developers
A data poisoning attack often occurs during the model training phase, undermining the integrity of the AI from the ground up. It can introduce hidden vulnerabilities, such as backdoors in code-generating models, without immediately affecting apparent performance. The attack forces developers to invest in ongoing security for data pipelines and model architectures, as poisoned data from unsanitized sources can lead to widespread flaws. The result isn't just technical failure but also ethical implications, legal liabilities and regulatory risk, especially if the model perpetuates biases or produces harmful outputs.
Risks for deployers
The impacts are more operational and downstream for deployers. Once a poisoned model is in use, it can lead to unreliable decisions, eroding user trust and causing real-world harm. Deployers will likely be exposed to the similar ethical, legal and regulatory risks as developers.
For both parties, data poisoning means compromised AI reliability, increased vulnerability to further attacks and a loss of confidence in the technology. It can quickly turn an AI system from a tool for efficiency into a liability, with developers bearing the brunt of creation-stage fixes and deployers dealing with deployment-stage fallout.
The need for retraining: Associated costs and business interruption
Addressing data poisoning typically requires retraining the AI model, as simply removing suspected poisoned data often fails to fully eliminate sophisticated attacks. Retraining potentially involves auditing and cleaning datasets, and then rebuilding the model from scratch, equipping the model with adversarial training methods to enhance robustness. However, retraining isn't always effective against adaptive attacks and can leave residual performance issues.
The costs are substantial. Attackers can execute large-scale poisoning for under $1,000 (e.g., injecting 15 million misinformation tokens), but victims face high remediation expenses. Retraining large models demands significant computational resources, potentially costing thousands to millions of dollars in cloud computing fees, data storage and expert labor. Costs depend on the model's scale. For developers, the cost diverts resources from innovation; for deployers, it adds to operational overhead.
Business interruption compounds these costs. While investigating and retraining, systems may need to be taken offline, leading to downtime that disrupts services — like halted algorithmic trading causing financial losses or paused healthcare diagnostics risking patient safety. In extreme cases, downtime can result in revenue loss, regulatory fines and reputational damage, with interruptions lasting days to months.
Industry case studies
Scenario 1: Autonomous vehicles
Context: A company developing autonomous vehicle systems relies on machine learning models trained on large datasets of road conditions, traffic patterns and pedestrian behavior. These datasets are collected from various sources, including public contributions, sensor data and third-party providers.
Attack: A malicious actor infiltrates the dataset by submitting falsified data through public contribution channels. For example:
- They introduce images of stop signs altered with graffiti or stickers that resemble other traffic signs.
- They add data showing pedestrians jaywalking in unrealistic patterns or vehicles behaving erratically.
Impact:
- The machine learning model learns incorrect associations, such as failing to recognize stop signs or misinterpreting pedestrian behavior.
- Autonomous vehicles may fail to stop at intersections or misjudge pedestrian movements, leading to accidents and loss of trust in the technology.
- The company faces legal liabilities, reputational damage and financial losses due to recalls and lawsuits.
Scenario 2: Medical diagnostics
Context: A healthcare provider uses AI-powered diagnostic tools to analyze medical images (e.g., X-rays, MRIs) and predict diseases such as cancer or heart conditions. The AI model is trained on datasets sourced from hospitals, research institutions and public repositories.
Attack: A malicious actor poisons the dataset by injecting manipulated medical images or mislabeled data. They add images of healthy tissues labeled as cancerous or vice versa. They introduce subtle artifacts in images that confuse the AI model during training.
Impact:
- The diagnostic tool produces false positives or false negatives, leading to incorrect diagnoses.
- Patients may undergo unnecessary treatments or fail to receive critical care, resulting in patient harm or fatalities.
- The healthcare provider faces lawsuits, regulatory scrutiny and loss of credibility in the medical community.
Scenario 3: Financial Fraud Detection
Context: A bank uses AI systems to detect anomalies that are indicators of fraudulent activity.
Attack: A malicious actor targets the bank's fraud detection system, infiltrating the dataset through a compromised third-party data provider. They inject manipulated transaction data that mimicked legitimate patterns.
Impact:
- The fraud detection system failed to flag a series of fraudulent transactions amounting to millions of dollars.
- The institution faced regulatory scrutiny and was required to overhaul its fraud detection processes.
- Customers lost trust in the institution's ability to protect their assets, leading to a decline in customer retention.
Best practices for preventing and mitigating data poisoning
Preventing and mitigating data poisoning requires a multi-layered approach focused on data integrity and system resilience.
Preventing data poisoning attack
- Secure data sources: Use verified, authenticated datasets and restrict access to trusted users with proper clearances. Avoid relying solely on open-source repositories without vetting.
- Data validation and sanitization: Regularly clean and diversify data to detect anomalies early. Implement cryptographic authentication or blockchain-based verification for data pipelines.
- Red teaming and testing: Simulate attacks and conduct negative testing by inputting poor data to observe effects. Benchmark model performance against peers.
Mitigating data poisoning attacks
- Detection mechanisms: Implement perplexity filters to identify potentially malicious inputs with high perplexity and use preprocessing techniques, such as paraphrasing prompts, to neutralize threats effectively.
- Knowledge graph validation: Verify AI-generated outputs against structured knowledge bases to detect and prevent misinformation.
- Regular retraining and monitoring: Conduct periodic retraining and testing using clean datasets to minimize the impact of data poisoning.
- Human-in-the-loop deployment: Integrate human oversight into the process to enhance detection and response to potential threats.
Risk transfer — insurance solutions
To transfer the financial burden of data poisoning, organizations can explore specialized insurance products that may cover AI-related exposures. While some carriers are looking to expand coverage for AI risk exposure, others are looking to constrict coverage. Given the unsettled and rapidly evolving insurance market, a broad review of both existing policies and emerging AI risk products will be warranted.
Some Cyber insurance policies are adapting to include data poisoning. At least one Cyber insurance carrier now offers an endorsement that affirmatively expands Cyber insurance coverage to address data poisoning, along with infringement and regulatory violations. Professional liability policies, such as Errors and Omissions (E&O) insurance, may provide protection for AI-specific risks like faulty outputs or data misuse, with some carriers offering AI-focused endorsements.
However, policies vary, and exclusions for AI risks may apply in any insurance policy. We note evidence of explicit exclusions emerging on some policies, including commercial general liability and professional liability policies. As of this writing, at least three large insurance companies are seeking regulatory approval to add new policy exclusions that would allow them to deny claims tied to the use or integration of AI systems, including both AI agents and chatbots.2 Therefore, reviewing and customizing coverage is essential. Consulting with brokers to ensure explicit inclusion of data poisoning and other AI risk scenarios can help mitigate the exposure to uncovered AI-driven losses.