Sleeping Backdoor in AI Models Waiting to Strike

HiddenLayer's Security, Artificial Intelligence (SAI) team has discovered a new method for backdooring neural networks.

Dubbed as “ShadowLogic,” this technique enables attackers to insert backdoors into any neural network model by manipulating its computational graph, bypassing traditional code-based exploits.

This approach has major consequences for AI models because it introduces a new way for adversaries to hijack systems without changing any weights or biases.

ShadowLogic works within a model's architecture, embedding a backdoor into the computational graph—the framework that governs how neural networks process data.

Unlike previous attacks, this method can withstand fine-tuning, allowing the backdoor to remain operational even as models are updated.

When the hidden “logic” is triggered by specific inputs, the model produces attacker-defined results, transforming trusted AI applications into dangerous tools.

HiddenLayer warns that this vulnerability endangers the AI supply chain, making every fine-tuned model a security risk.

Neural network models, particularly large foundation models, are ideal candidates for this technique.

They are widely used across industries and frequently repurposed in downstream applications ranging from image classification to fraud detection.

If the attack is carried out in critical sectors where AI is used to make decisions, it could have disastrous consequences.

The researchers demonstrated how a bad actor could hijack AI models without leaving a trace.

It's not just about training phase backdoors anymore; now, an attacker can modify how a model computes without touching its training data or weights.

ShadowLogic is not merely conceptual. It has been successfully tested on well-known architectures such as ResNet, which is widely used for image classification.

The researchers used a simple visual trigger, a red square in an image to show how the backdoor could manipulate the model's output.

The model's predictions changed when the red square appeared, despite the fact that the original image had no visible corruption.

Original vs triggered images with red trigger (via: HiddenLayer)

Worse still, these triggers do not have to be obvious. The visual marker used in demonstrations could be rendered imperceptible, allowing an adversary to subtly modify an image to cause malicious behavior in real-world applications.

The HiddenLayer team also broadened their experiments to include other architectures, such as the YOLO model for object detection and the Phi-3 small language models.

These trials were equally successful, demonstrating that models other than image classifiers are vulnerable.

For example, by defining a specific input phrase, researchers were able to control the Phi-3 model's output, replacing legitimate responses with attacker-defined content.

ShadowLogic is not the first attack technique to target AI models. Previous research from New York University and UC Berkeley investigated backdoors inserted during the training phase.

But HiddenLayer's technique goes a step further, demonstrating that backdoors can be embedded without the need for complex retraining.

Sleeping Backdoor in AI Models Waiting to Strike

Hackers can now hijack neural networks undetected using ShadowLogic, a new backdoor technique.

Leave a Reply Cancel reply

Latest stories

Grayscale Dogecoin ETF Makes Historic NYSE Trading Debut

Breaking: FBI Probes Cardano Network Split After Malicious Transaction

Bitcoin Holds at $85K as Global Trade Tensions and Fed Speculation Unfold

Michael Saylor Doubles Down on Bitcoin (BTC) with $285M Investment Amid Global Uncertainty

You might also like

Newly Discovered Vulnerability Poses Risk to Millions of Online Stores

New Cyberattack Targets Industrial Automation Sector with Malware

Fake AWS Packages Disguise Malware in JPEG Files

Widespread PDF Email Scam Where Hacker Knows Your Phone Number

Quick Links

Company

Socials

Leave a Reply Cancel reply

Subscribe to our newsletter

Latest stories

Newsletter

Socials