- ShadowLogic allows attackers to implant backdoors without modifying code.
- Backdoors persist even through model fine-tuning.
- This attack can target any AI model, making it a supply chain risk
HiddenLayer's Security, Artificial Intelligence (SAI) team has discovered a new method for backdooring neural networks.
Dubbed as “ShadowLogic,” this technique enables attackers to insert backdoors into any neural network model by manipulating its computational graph, bypassing traditional code-based exploits.
This approach has major consequences for AI models because it introduces a new way for adversaries to hijack systems without changing any weights or biases.
ShadowLogic works within a model's architecture, embedding a backdoor into the computational graph—the framework that governs how neural networks process data.
Unlike previous attacks, this method can withstand fine-tuning, allowing the backdoor to remain operational even as models are updated.
When the hidden “logic” is triggered by specific inputs, the model produces attacker-defined results, transforming trusted AI applications into dangerous tools.
HiddenLayer warns that this vulnerability endangers the AI supply chain, making every fine-tuned model a security risk.
Neural network models, particularly large foundation models, are ideal candidates for this technique.
They are widely used across industries and frequently repurposed in downstream applications ranging from image classification to fraud detection.
If the attack is carried out in critical sectors where AI is used to make decisions, it could have disastrous consequences.
The researchers demonstrated how a bad actor could hijack AI models without leaving a trace.
It's not just about training phase backdoors anymore; now, an attacker can modify how a model computes without touching its training data or weights.
Subscribe to our newsletter
ShadowLogic is not merely conceptual. It has been successfully tested on well-known architectures such as ResNet, which is widely used for image classification.
The researchers used a simple visual trigger, a red square in an image to show how the backdoor could manipulate the model's output.
The model's predictions changed when the red square appeared, despite the fact that the original image had no visible corruption.
Worse still, these triggers do not have to be obvious. The visual marker used in demonstrations could be rendered imperceptible, allowing an adversary to subtly modify an image to cause malicious behavior in real-world applications.
The HiddenLayer team also broadened their experiments to include other architectures, such as the YOLO model for object detection and the Phi-3 small language models.
These trials were equally successful, demonstrating that models other than image classifiers are vulnerable.
For example, by defining a specific input phrase, researchers were able to control the Phi-3 model's output, replacing legitimate responses with attacker-defined content.
ShadowLogic is not the first attack technique to target AI models. Previous research from New York University and UC Berkeley investigated backdoors inserted during the training phase.
But HiddenLayer's technique goes a step further, demonstrating that backdoors can be embedded without the need for complex retraining.