Is there any research on the development of attacks against artificial intelligence systems?



Is there any research on the development of attacks against artificial intelligence systems?

For example, is there a way to generate a letter "A", which every human being in this world can recognize but, if it is shown to the state-of-the-art character recognition system, this system will fail to recognize it? Or spoken audio which can be easily recognized by everyone but will fail on the state-of-the-art speech recognition system.

If there exists such a thing, is this technology a theory-based science (mathematics proved) or an experimental science (randomly add different types of noise and feed into the AI system and see how it works)? Where can I find such material?

Lion Lai

Posted 2019-10-09T17:45:29.010

Reputation: 383


I'm amazed nobody's posted the famous turtle gun yet! "Synthesizing Robust Adversarial Examples" (Athalye et al., 2018) And for those who haven't seen the turtle gun before, there's an xkcd for that.

– Quuxplusone – 2019-10-10T04:33:18.100


Street signs that look to self-driving cars like they say "50km/h" but to humans like they say "80km/h" here

– lucidbrot – 2019-10-10T17:17:47.400


A related question:

– nbro – 2019-10-11T22:23:51.967


I'm not sure if there is something beyond adversarial examples that could answer your question, but on that topic, see CleverHans, a library to benchmark model vulnerability to adversarial examples, and also this blog post, which I think is a rather accessible introduction to the topic.

– jdehesa – 2019-10-30T17:09:12.317



Yes, there is some research on this topic, which can be called adversarial machine learning, which is more an experimental field.

An adversarial example is an input similar to the ones used to train the model, but that leads the model to produce an unexpected outcome. For example, consider an artificial neural network (ANN) trained to distinguish between oranges and apples. You are then given an image of an apple similar to another image used to train the ANN, but that is slightly blurred. Then you pass it to the ANN, which unexpectedly predicts the object to be an orange.

Several machine learning and optimization methods have been used to detect the boundary behaviour of machine learning models, that is, the unexpected behaviour of the model that produces different outcomes given two slightly different inputs (but that correspond to the same object). For example, evolutionary algorithms have been used to develop tests for self-driving cars. See, for example, Automatically testing self-driving cars with search-based procedural content generation (2019) by Alessio Gambi et al.


Posted 2019-10-09T17:45:29.010

Reputation: 19 783

5Even more extreme, I saw examples of changing just a few pixels in the original image made the network classify it as something completely different, while it still looked the same to a human. Imagine a photograph of a school bus with a resolution of 1280x720 given to a well trained network for object classification for example. Changing as little as let's say 10 pixels could cause the network to classify the bus as a zebra or water bottle or not at all. It's basically exploiting edge cases of weight combinations within the network to craft behavioral extremes. – Num Lock – 2019-10-11T05:55:46.350

@NumLock I used the example of a blurred image only to convey the idea that the images are in practice different. However, you're right: even a very small change can trick the model. – nbro – 2019-10-11T12:41:33.187


Sometimes if the rules used by an AI to identify characters are discovered, and if the rules used by a human being to identify the same characters are different, it is possible to design characters that are recognized by a human being but not recognized by an AI. However, if the human being and AI both use the same rules, they will recognize the same characters equally well.

A student I advised once trained a neural network to recognize a set of numerals, then used a genetic algorithm to alter the shapes and connectivity of the numerals so that a human could still recognize them but the neural network could not. Of course, if he had then re-trained the neural network using the expanded set of numerals, it probably would have been able to recognize the new ones.

S. McGrew

Posted 2019-10-09T17:45:29.010

Reputation: 346


Yes there are, for instance one pixel attacks described in

Su, J.; Vargas, D.V.; Kouichi, S. One pixel attack for fooling deep neural networks. arXiv:1710.08864

One pixels attacks are attacks in which changing one pixel in input image can strongly affect the results.


Posted 2019-10-09T17:45:29.010

Reputation: 201

Appreciated the paper, the examples depicted are quite amusing. Graphic designers will go wild when their teapots become joysticks. – CPHPython – 2019-10-11T09:51:17.823


Here's an example:

In his recent book The Fall, Stephenson wrote about smartglasses that that project a pattern over the facial features to foil recognition algorithms (which seems not only feasible but likely;)

Here's an article from our sponsors, Adversarial AI: As New Attack Vector Opens, Researchers Aim to Defend Against It which includes this graphic of "Five ways AI hacks can lead to real world problems".

The article references the conference on The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation, where you can download the full report.

I'm assuming many such examples exist in the real world, and will amend this link-based answer as I find them. Good question!


Posted 2019-10-09T17:45:29.010

Reputation: 5 886


Isn't that essentially what chess does? For example, A human can recognize that a Ruy exchange offers white great winning chances (because of pawn structure) by move 4 while an engine would take several hours of brute force calculation to understand the same idea.


Posted 2019-10-09T17:45:29.010

Reputation: 41

1Not really. Humans know this because they've read about it. Chess engines also know it, because they usually have hardcoded opening tables. Both the humans and the chess AIs would take a fair amount of time to deduce the quality of an opening from scratch. – Ray – 2019-10-10T17:35:09.603


There are many insightful comments and answers so far. I want to illustrate my idea of "color blindness test" more. Maybe it's a hint to lead us to the truth.

Imagine there are two people here. One is colorblind (AI) and another one is non-colorblind (human). If we show them a normal number "6", both of them can easily recognize it as number 6. Now, if we show them a delicately designed colorful number "6", only human can recognize it as number 6 while AI will recognize it as number 8. The interesting of this analogy is that we can not teach/train colorblind people to recognize this delicately designed colorful number "6" because of natural difference, which I believe is also the case between AI and human. AI gets results from computation while human gets results from "mind". Therefore, like @S. McGrew's answer, if we can find the fundamental difference between AI and human of how we read things, then this question is answered.

Lion Lai

Posted 2019-10-09T17:45:29.010

Reputation: 383

Don't forget that the AI gets results from analysis of the data, so if the data set is corrupt, the output will be weak. Because we're talking about statistical AI, it's not about a single 6, but about a the greatest set of 6's we can find (sixes complement?;) So I wanted to attack a text recognition algorithm, I'd spam it with bad data. The counter there would be a mechanism to vet the data, and exclude the spam. – DukeZhou – 2019-10-10T23:22:14.543

1In otherwords, I'd train a botswarm to answer Captcha with the least likely choices, or the choices that are the most likely mistakes. Given sufficient junk inputs, I could wreck the algorithm's output. (This gets scary when you consider such techniques are surely already being used in regard to financial trading algorithms.) – DukeZhou – 2019-10-10T23:35:39.840

For instance, if I knew a trading algorithm incorporated frequency of trades for a given asset, I might make a huge number of tiny trades to bias that metric and influence the overall analysis in a misleading way. But, end of the day, Machine Learning is rooted in statistical analysis. – DukeZhou – 2019-10-10T23:38:00.467

So even where the hack is to modify an input (such as a weird 6) to not be recognizable to algorithms or to specific algorithm, it's a function of understanding the result of the target algorithm's analysis of a data set, to choose output an algorithm trained on that data set wouldn't recognize. – DukeZhou – 2019-10-11T00:28:07.943


Here's a live demo:

Recall that neural nets are trained by feeding in the training data, evaluating the net, and using the error between the observed and the intended output to adjust the weights and bring the observed output closer to the intended. Most attacks have been on the observation that you can, instead of updating the weights, update the input neurons. That is, permute the image. However, this attack is very finnicky. It falls apart when the permuted image is scaled, rotated, blurred, or otherwise altered. That's clearly a cat to us, but guacamole to the neural net. But a slight rotation and the net starts classifying it correctly again.

However recent breakthroughs allow actual objects presented to a real camera to be reliably misclassified. That's clearly a turtle, albeit with a wonky pattern on its shell. But that net is convinced it's a rifle from practically every angle.


Posted 2019-10-09T17:45:29.010

Reputation: 31


There are some research at least on the "foolability" of neural networks, that gives insight on potential high risk of neural nets even when they "seem" 99.99% acurate.

A very good paper on this is in Nature:

In a nutshell:

It shows diverse exemples of fooling neural networks/AIs, for exemple one where a few bits of scotch tape places on a "Stop" sign changes it, for the neural net, into a "limited to 40" sign... (whereas a human would still see a "Stop" sign!).

And also 2 striking exemples of turning an animal into another by just adding invisible (for humans!) colored dots, (turning in the exemple a Panda into a Gibbon, where a human hardly see anything different so still sees a Panda).

Then they elaborate on diverse research venues, involving for exemple ways to try to prevent such attacks.

The whole page is a good read to any AI researcher and shows lots of troubling problems (especially for automated systems such as cars, and soon maybe armaments).

An exerpt relevant to the question:

Hendrycks and his colleagues have suggested quantifying a DNN’s robustness against making errors by testing how it performs against a large range of adversarial examples. However, training a network to withstand one kind of attack could weaken it against others, they say. And researchers led by Pushmeet Kohli at Google DeepMind in London are trying to inoculate DNNs against making mistakes. Many adversarial attacks work by making tiny tweaks to the component parts of an input — such as subtly altering the colour of pixels in an image — until this tips a DNN over into a misclassification. Kohli’s team has suggested that a robust DNN should not change its output as a result of small changes in its input, and that this property might be mathematically incorporated into the network, constraining how it learns.

For the moment, however, no one has a fix on the overall problem of brittle AIs. The root of the issue, says Bengio, is that DNNs don’t have a good model of how to pick out what matters. When an AI sees a doctored image of a lion as a library, a person still sees a lion because they have a mental model of the animal that rests on a set of high-level features — ears, a tail, a mane and so on — that lets them abstract away from low-level arbitrary or incidental details. “We know from prior experience which features are the salient ones,” says Bengio. “And that comes from a deep understanding of the structure of the world.”

Another excerpt, near the end:

"Researchers in the field say they are making progress in fixing deep learning’s flaws, but acknowledge that they’re still groping for new techniques to make the process less brittle. There is not much theory behind deep learning, says Song. “If something doesn’t work, it’s difficult to figure out why,” she says. “The whole field is still very empirical. You just have to try things.”"

Olivier Dulac

Posted 2019-10-09T17:45:29.010

Reputation: 141