Understanding the AI Supply Chain
AI Supply Chain Attacks: The Hidden Backdoor
Before diving into attacks, let’s map the AI supply chain. A modern AI system depends on multiple layers:Related: identity security | red teaming LLM applications | prompt injection vulnerabilities
Layer 1: Training Data — The datasets used to train the model. Can be public datasets, proprietary data, or scraped web content.
Layer 2: Base Model — The foundation model (GPT-4, Claude, Llama, etc.) trained on massive datasets. Usually provided by a large tech company or open-source community.
Layer 3: Fine-tuning Data — Domain-specific data used to adapt the base model for particular use cases.
Layer 4: Model Weights — The actual trained parameters. Stored as large files (gigabytes to terabytes).
Layer 5: Inference Pipeline — The code that processes inputs, runs the model, and generates outputs. Includes tokenizers, prompt templates, and post-processing.
Layer 6: Tools and Plugins — External systems the model interacts with: MCP servers, APIs, databases, web search.
Layer 7: Deployment Infrastructure — The servers, containers, and orchestration systems that run the model.
Each layer is a potential attack vector. An attacker only needs to compromise one layer to undermine the entire system.
Attack 1: Training Data Poisoning
Training data poisoning is the most fundamental AI supply chain attack. If you control what a model learns from, you control what it does.
How It Works
An attacker modifies the training data to include carefully crafted examples that teach the model to behave maliciously in specific situations.
Simple example: An attacker adds 1,000 examples to a sentiment analysis training set where reviews containing “refund policy” are labeled as positive sentiment. After training, the model consistently classifies complaints about refund policies as positive — making it useless for customer service teams trying to identify unhappy customers.
Sophisticated Data Poisoning
Modern data poisoning is much more subtle:
Targeted Poisoning: The attacker only wants to change the model’s behavior for specific inputs, while maintaining normal behavior for everything else. This makes the poisoning nearly impossible to detect through standard testing.
Backdoor Insertion: The attacker teaches the model to respond normally 99.9% of the time, but to produce a specific malicious output when it encounters a specific trigger pattern. For example, a code generation model might insert a security vulnerability whenever it encounters code containing the variable name “config_backup.”
Clean-label Poisoning: The attacker modifies the training data in ways that don’t change the labels — the examples look perfectly normal — but the modifications subtly influence the model’s decision boundaries.
Real-World Impact
In 2025, researchers demonstrated that poisoning just 0.1% of a model’s training data could reliably insert backdoors that persisted through fine-tuning. This means an attacker could poison a public dataset, and the backdoor would survive even after organizations fine-tuned the model on their own data.
Attack 2: Compromised Base Models
When you use a base model from Hugging Face, PyTorch Hub, or a commercial API, you’re trusting the model provider. But what if the model itself has been tampered with?
How It Works
An attacker either:
- Uploads a malicious model to a public repository (impersonating a legitimate author)
- Modifies a legitimate model during distribution
- Compromises the model provider’s infrastructure
The malicious model appears to work normally but contains hidden behaviors.
The Hugging Face Model Spoofing Problem
In May 2023, security researchers at Lasso Security published findings showing that nearly 3,300 malicious PyTorch and pickle-based model files were hosted on Hugging Face. These models exploited the fact that Hugging Face’s model hub allows arbitrary code execution during model loading — when you call torch.load() or pickle.load() on a model file, embedded Python code executes automatically.
The attack worked like this:
- Attacker uploads a model with an enticing name (e.g., “GPT-4-quantized-free”, “Llama-3-finetuned-chat”)
- The model’s
__init__.pyor pickle payload contains malicious code - When a developer downloads and loads the model with standard libraries (
transformers,torch), the code executes - The malicious payload exfiltrates environment variables, SSH keys, or cloud credentials
- The model may also function normally, making the compromise invisible
In June 2024, JFrog Security found an additional 100+ malicious models on Hugging Face that used the same pickle deserialization technique. Some of these models were specifically designed to target ML engineers who work with GPU clusters — stealing AWS credentials, accessing Kubernetes secrets, and establishing persistent backdoors.
This is not a hypothetical risk. It is an actively exploited attack vector with hundreds of documented victims.
Defense Strategies
- Verify model hashes. Always check the SHA-256 hash of downloaded model files against the official values.
- Use trusted sources only. Only download models from verified publishers with established reputations.
- Run models in sandboxes. Never load an untrusted model on a machine with access to sensitive data.
- Audit model behavior. Test models against known benchmarks before deploying them in production.
Implementing safe model loading:
import hashlib
import json
import os
# NEVER use torch.load() or pickle.load() on untrusted model files directly.
# Instead, use safetensors format (no code execution) and verify hashes.
def verify_model_integrity(model_path, expected_hash):
"""Verify SHA-256 hash before loading."""
sha256 = hashlib.sha256()
with open(model_path, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
sha256.update(chunk)
actual_hash = sha256.hexdigest()
if actual_hash != expected_hash:
raise ValueError(
'Model integrity check failed.n'
f'Expected: {expected_hash}n'
f'Actual: {actual_hash}n'
'The model may have been tampered with.'
)
return True
def safe_load_model(model_path, allowed_hashes=None):
"""Load a model safely with hash verification."""
# Step 1: Verify hash if provided
if allowed_hashes:
verify_model_integrity(model_path, allowed_hashes)
# Step 2: Prefer safetensors format (no arbitrary code execution)
if model_path.endswith('.safetensors'):
from safetensors import safe_open
tensors = {}
with safe_open(model_path, framework='pt') as f:
for key in f.keys():
tensors[key] = f.get_tensor(key)
return tensors
# Step 3: For pickle format, load in sandboxed environment
import subprocess
result = subprocess.run(
['python3', '-c', f"""
import torch
import sys
# Disable pickle's __reduce__ to prevent code execution
import pickle
class SafeUnpickler(pickle.Unpickler):
def find_class(self, module, name):
allowed = {{'torch', 'numpy', 'collections', 'math'}}
if module.split('.')[0] not in allowed:
raise pickle.UnpicklingError(
f'Blocked import: {{module}}.{{name}}'
)
return super().find_class(module, name)
with open('{model_path}', 'rb') as f:
data = SafeUnpickler(f).load()
print('LOADED_OK')
"""],
capture_output=True, timeout=30
)
if 'LOADED_OK' not in result.stdout.decode():
raise RuntimeError('Model load failed or attempted unsafe import')
Attack 3: Compromised Dependencies and Plugins
Modern AI systems are assembled from many components: tokenizers, prompt templates, embedding models, retrieval systems, MCP servers, and plugins. Each dependency is a potential attack vector.
The Plugin Attack Surface
MCP (Model Context Protocol) servers are particularly concerning because they:
- Run as separate processes with their own code
- Handle sensitive data (database queries, API calls, file access)
- Are often installed from third-party sources
- Have deep access to the agent’s capabilities
Case Study: Compromised RAG Pipeline
A company built a Retrieval-Augmented Generation (RAG) system that used a vector database plugin from a popular open-source project. An attacker compromised the plugin’s update mechanism:
- The plugin normally retrieved relevant documents from the vector database
- The compromised version occasionally injected additional “retrieved documents” containing malicious instructions
- The LLM read these injected documents and followed the instructions
- The system appeared to work normally but was leaking sensitive information
Defense Strategies
- Pin dependency versions. Never use
latesttags in production. - Verify checksums. Check hashes of all downloaded dependencies.
- Monitor network activity. Watch for unexpected connections from plugins and dependencies.
- Review plugin code. For critical systems, audit the source code of all plugins before deployment.
- Use private registries. Mirror dependencies in a private registry where you control access.
Implementing dependency verification:
import hashlib
import json
import requests
def verify_dependency(package_name, version, expected_hash):
"""Verify a dependency before installation."""
# Check against known-good hash database
# (this would be your internal registry or a trusted source)
url = f"https://pypi.org/pypi/{package_name}/{version}/json"
response = requests.get(url, timeout=10)
data = response.json()
# PyPI provides sha256 hashes for all files
for file_info in data.get('urls', []):
if file_info['filename'].endswith('.whl'):
actual_hash = file_info['digests']['sha256']
if actual_hash != expected_hash:
raise ValueError(
f'Hash mismatch for {package_name}=={version}n'
f'Expected: {expected_hash}n'
f'PyPI: {actual_hash}'
)
return True
raise ValueError(f'Package {package_name}=={version} not found on PyPI')
def audit_mcp_server(server_path):
"""Audit an MCP server before connecting it to an agent."""
# Check for suspicious patterns in the server code
suspicious_patterns = [
('requests.post', 'External HTTP call'),
('subprocess.Popen', 'Process execution'),
('os.system', 'System command execution'),
('socket.connect', 'Network connection'),
('eval(', 'Dynamic code execution'),
('exec(', 'Dynamic code execution'),
('base64.b64decode', 'Base64 decoding'),
]
with open(server_path) as f:
code = f.read()
findings = []
for pattern, description in suspicious_patterns:
if pattern in code:
findings.append(f'FOUND: {description} ({pattern})')
return findings
Attack 4: Model Extraction and Stealing
Model stealing isn’t a supply chain attack in the traditional sense, but it’s a critical supply chain risk. If an attacker can copy your model, they can:
- Analyze it for vulnerabilities
- Reproduce your product without the training cost
- Find and exploit backdoors you didn’t know existed
How Model Extraction Works
The attacker interacts with your model through its API (or a public interface) and uses the responses to train a copy:
- Send thousands of carefully crafted queries to the model
- Record the outputs
- Use these input-output pairs to train a “student” model
- The student model approximates the original model’s behavior
Modern extraction attacks can copy significant portions of a model’s capability with as few as 10,000 queries for smaller models. However, research from Carlini et al. (2024) shows that extracting meaningful capability from GPT-4-class production models typically requires 100,000+ queries, making it expensive but not impossible for well-funded attackers.
Defense Strategies
- Rate limit API access. Make it expensive to extract the model.
- Add noise to outputs. Small random perturbations that don’t affect quality but make extraction harder.
- Monitor query patterns. Detect systematic extraction attempts.
- Watermark your model. Embed detectable patterns in the model’s outputs.
Implementing extraction detection:
from collections import defaultdict
import time
class ExtractionDetector:
"""Detect systematic model extraction attempts."""
def __init__(self, window_seconds=3600, max_queries=1000,
uniqueness_threshold=0.85):
self.window = window_seconds
self.max_queries = max_queries
self.uniqueness = uniqueness_threshold
self.user_history = defaultdict(list)
def check_request(self, user_id, prompt):
"""Check if this request might be part of an extraction attack."""
now = time.time()
history = self.user_history[user_id]
# Prune old entries
history = [(t, p) for t, p in history
if now - t < self.window]
self.user_history[user_id] = history
# Check 1: Volume — too many queries in the window
if len(history) >= self.max_queries:
return {'risk': 'HIGH', 'reason': f'Query volume {len(history)} exceeds limit'}
# Check 2: Diversity — extraction uses diverse, systematic inputs
recent_prompts = [p for t, p in history]
if len(recent_prompts) > 50:
unique_ratio = len(set(recent_prompts)) / len(recent_prompts)
if unique_ratio > self.uniqueness:
return {'risk': 'MEDIUM',
'reason': f'High prompt diversity ({unique_ratio:.2f})'}
# Check 3: Similarity — sequential prompts probing same topic
if len(recent_prompts) >= 10:
avg_length = sum(len(p) for p in recent_prompts[-20:]) / 20
# Extraction queries tend to be similar length (systematic)
lengths = [len(p) for p in recent_prompts[-20:]]
variance = sum((l - avg_length) ** 2 for l in lengths) / 20
if variance < 50: # Very consistent prompt lengths
return {'risk': 'LOW',
'reason': 'Unusually consistent prompt structure'}
history.append((now, prompt))
return {'risk': 'NONE'}
Attack 5: Compromised Inference Pipeline
The code that processes inputs, runs the model, and generates outputs is itself a supply chain component that can be compromised.
How It Works
An attacker modifies the inference pipeline to:
- Log all inputs and outputs to an external server (data exfiltration)
- Modify outputs after the model generates them (output manipulation)
- Inject additional inputs before the model processes them (input manipulation)
- Change model parameters at runtime (runtime manipulation)
Real-World Example: Infrastructure Compromises in ML Pipelines
Supply chain attacks against AI infrastructure are well-documented. In December 2022, the PyTorch torchtriton package was compromised on PyPI — a malicious package uploaded with the same name exfiltrated system data from machines that installed it. In separate incidents, compromised GPU driver containers have been found to include cryptominers and data exfiltration tools.
These incidents highlight a key principle: the inference pipeline itself is a target. Attackers don’t need to compromise the model — compromising the code that loads and runs the model is equally effective.
Defense Strategies
- Verify container images. Use checksums and trusted registries.
- Implement runtime integrity checks. Verify that the inference code hasn’t been modified at runtime.
- Use immutable infrastructure. Deploy containers as read-only where possible.
- Network segmentation. Limit the inference pipeline’s network access to only what’s necessary.
Building a Secure AI Supply Chain
Here’s a practical framework for securing your AI supply chain:
1. Know Your Supply Chain
Map every component your AI system depends on — models, datasets, libraries, plugins, infrastructure. You can’t secure what you can’t see.
2. Verify Everything
Use cryptographic hashes to verify the integrity of every component. For models, this means checking the hash of the weight files. For code, this means verifying commit hashes. For data, this means checking dataset checksums.
3. Minimize Dependencies
Every dependency is an attack surface. Only include what you absolutely need. Question every library, plugin, and tool integration.
4. Sandbox Everything
Run untrusted components in isolated environments with restricted network access. This is especially important for models downloaded from public repositories and third-party plugins.
5. Monitor Continuously
Deploy monitoring that detects when components behave differently than expected. This includes:
- Network activity monitoring (unexpected connections)
- Performance monitoring (unusual latency or resource usage)
- Output monitoring (deviations from expected behavior)
- Input monitoring (unusual query patterns)
Implementing model behavior monitoring:
class ModelBehaviorMonitor:
"""Detect when a model's behavior changes unexpectedly."""
def __init__(self, baseline_outputs):
self.baseline = baseline_outputs
def check_output_drift(self, input_text, model_output):
"""Compare current output against baseline for similar inputs."""
similarity = self.compute_similarity(
model_output,
self.get_baseline_for(input_text)
)
if similarity < 0.7:
return {
'alert': 'OUTPUT_DRIFT_DETECTED',
'similarity': similarity,
'possible_causes': [
'Model weights modified',
'Plugin/MCP server compromised',
'Inference pipeline tampered',
'Legitimate model update'
]
}
return {'alert': None}
6. Have an Incident Response Plan
Know what you’ll do when (not if) a supply chain compromise is detected. This includes:
- How to quickly identify which components are affected
- How to roll back to a known-good version
- How to assess the damage
- Who to notify
The Regulatory Landscape
Governments are starting to address AI supply chain security:
EU AI Act (2024): Requires risk assessments for AI systems, including supply chain risks. High-risk AI systems must maintain detailed documentation of their supply chain.
US Executive Order on AI (2023): Directs agencies to develop guidelines for AI supply chain security and establishes reporting requirements for critical AI infrastructure.
NIST AI RMF: Provides a risk management framework that includes supply chain considerations.
These regulations are still evolving, but the direction is clear: organizations will increasingly be held accountable for the security of their AI supply chains.
Key Takeaways
- The AI supply chain has more attack surfaces than traditional software. Training data, models, plugins, and inference pipelines are all potential vectors.
- Training data poisoning is nearly invisible but can fundamentally compromise a model’s behavior.
- Model verification is essential. Always verify the integrity of downloaded models and datasets.
- Plugins and MCP servers are high-risk components because they have deep access to the AI system.
- Sandboxing and isolation are your best friends. Never trust untrusted components with access to sensitive data.
- Supply chain security is a continuous process, not a one-time checklist. Monitor, verify, and update constantly.
- AI-generated code is a new supply chain risk. When AI coding assistants generate code that goes into production, that code becomes part of your supply chain. Treat AI-generated code with the same scrutiny as third-party dependencies.
The organizations that treat AI supply chain security as seriously as they treat traditional software supply chain security will be the ones that survive the coming wave of AI-targeted attacks.
References
- “Machine Learning: The High-Interest Credit Card of Technical Debt” — Google (2015)
- “Poisoning Web-Scale Training Datasets is Practical” — NPR (2023)
- “Extracting Training Data from Large Language Models” — Carlini et al. (2023)
- “Scaling Laws for Model Extraction” — Carlini et al. (2024)
- PyTorch
torchtritonsupply chain incident — PyPI security advisory (December 2022) - EU AI Act — Official Journal of the European Union (2024)
- NIST AI Risk Management Framework
- NIST Secure Software Development Framework (SSDF) — SP 800-218
