This work has led to some unfortunate misconceptions. In particular, this weakne...

jeffclune · on March 23, 2015

Hi Andrej,

I agree with most of what you say, but note that nearly all of the images in the paper were generated without the gradient. I.e. all the images produced by evolution did not use the gradient, only the output of the network regarding its prediction confidence. There are some images that use the gradient, but only to show a 3rd class of "fooling images".

PS. It's nice to see our work (both this paper and the NIPS paper on transfer learning) in your class. Thanks for including it. I wish I could have my students take your course!

yosinski · on March 23, 2015

> This work has led to some unfortunate misconceptions.

Agreed; the weaknesses reported should definitely not be taken to affect only convnets or only deep learning. Ian's "Explaining and Harnessing Adversarial Examples" paper (linked by @Houshalter) should be required reading :).

> backpropagation allows us to efficiently compute (with dynamic programming, basically) exactly the single most damaging noise pattern out of all billions.

True. By using backprop, one can easily compute exact patterns of pixelwise noise to add to an image to produce arbitrary desired output changes. However, it's an important detail that that most of the images in the paper (all except the last section) were produced without knowledge of the weights of the network or by using backpropagation at all. This means a would-be-adversary need not have access to the complete model, only a method of running many examples through the network and checking the outputs.

> ...there are billion tiny noise patterns you could add to the input.

Perhaps because the CPPN fooling images were created in a different way (without using backprop), they seem to fool networks in a more robust way than one might think. Far from being a brittle addition of a very precise, pixelwise noise pattern, many fooling images are robust enough that their classification holds up even under rather severe distortions, such as using a cell phone camera to take a photo of the pdf displayed on a monitor and then running it through an AlexNet trained with a different random seed (photo cred: Dileep George):

http://s.yosinski.com/jetpac_digitalclock.jpg http://s.yosinski.com/jetpac_greensnake.jpg http://s.yosinski.com/jetpac_stethoscope.jpg

I thought this was surprising the first time I saw it.

karpathy · on March 23, 2015

The two classes that you describe: 1. Adversary has the weights and architecture and 2. Adversary can only do forward pass and observe output, are equivalent when all you're trying to do is compute the gradient on the data. In case 1 I use backprop, in case 2 I can compute the gradient numerically, it just takes a bit longer. Your stochastic search speeds this up.

Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models. It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another, likely since they share similar training data (?) unclear. Anyway, really cool work :)

jeffclune · on March 23, 2015

> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models. It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another,

Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

> likely since they share similar training data (?) unclear.

The original Szegedy et al. paper shows that these sort of examples generalize even to networks trained on different subsets of the data (and with different architectures).

> Anyway, really cool work :)

Thanks. :-)

yosinski · on March 23, 2015

> Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

Good point. You could also do this with the gradient version too (fool in-house using gradients -> hopefully fool someone else's network), but the transferability of fooling examples might differ depending on how they are found.

wodenokoto · on March 24, 2015

I've quite enjoyed reading your paper since it was uploaded to arxiv in December and I have been toying with redoing the MNIST part of your experiment on various classifiers. (I'm particularly interested to see if images generated against an SVM can fool a nearest neighbor or something like that.)

But I'm having problems generating images: A top SVM classifier on MNIST has a very stable confidence distribution against noisy images. If I generate 1000 random images, only 1 or 2 of them will have confidences that are different from the median confidence distribution. That is, all the images are classified with the same confidence as class 1. They also share the same confidence for class 2, etc.

So it is very difficult to make changes that affect the output of the classifier.

Any tips on how to get started with generating the images?

jeffclune · on March 24, 2015

I would just unleash evolution. 1 or 2 in the first generation is a toehold, and from there evolution can begin to do its work. You can also try a larger population (e.g. 2000) and let it run for a while.

yosinski · on March 23, 2015

> ...in case 2 I can compute the gradient numerically, it just takes a bit longer.

Yep, true, might just take a while. On the other hand, even a very noisy estimate of the gradient might suffice, which could be faster to obtain. Perhaps someone will do that experiment soon. Maybe you could convince one of those students of yours to do this for extra credit?? ;).

> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models.

Ditto x2.

> It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another, likely since they share similar training data (?) unclear.

Yeah. I wonder if the subspaces found using non-gradient based exploration end up being either larger or overlapping more between networks than those found (more easily) with the gradient. Would be another interesting followup experiment.

JoeAltmaier · on March 23, 2015

Wait - they didn't use knowledge of the neural network internal state to calculate these patterns? Does that mean they could create equivalent images for human beings? What would those look like!

yosinski · on March 23, 2015

No, but we did make use of (1) a large number of input -> network -> output iterations, along with (2) precisely measured output values to decide which input to try next. It may not be so easy to experiment in the same way on natural organisms (ethically or otherwise).

Of course, if you're as clever as Tinbergen, you might be able to come up with patterns that fool organisms even without (1) or (2):

https://imgur.com/a/ibMUn

JoeAltmaier · on March 23, 2015

Perhaps a single experiment on millions of different people? A web experiment of some kind? "Which image looks more like a panda?" and flash two images on the screen.

yosinski · on March 23, 2015

That's a good idea, though note that there's a difference between asking "Which of these two images looks more like a panda?" and "Which of these two images looks more like a panda than a dog or cat?". The latter is the supervised learning setting used in the paper, and generally could lead to examples that look very different than pandas, as long as they look slightly more like pandas than dogs or cats. The former method is more like unsupervised density learning and could more plausibly produce increasingly panda-esque images over time.

A sort of related idea was explored with this site, where millions (ok, thousands) of users evolve shapes that look like whatever they want, but likely with a strong bias toward shapes recognizable to humans. Over time, many common motifs arise:

http://endlessforms.com/

tripzilch · on March 25, 2015

Problem is, even if you succeed and end up with a fabricated picture that fools human neural nets into believing it's a picture of a panda, how would you tell it's not really a picture of a panda?

You'd need another classifier to tell you "nope it's actually just random noise and shapes" ... hm.

JoeAltmaier · on March 31, 2015

Probably not hard - just engage the higher cognitive functions. "Does it look like noise? Yeah." Or just close one eye and look again.

tripzilch · on April 7, 2015

I think you missed my somewhat deeper philosophical point :)

Who gets to decide what is really a picture of a panda?

If we'd manage to craft a picture that could with very high certainty trick human neural nets (for the sake of argument, including those higher cognitive functions) into believing something is a picture of a panda, "except it actually really isn't", what does that even mean?

Human insists it's a picture of a panda, computer classifier maintains it's noise and shapes.

Who is right? :)

JoeAltmaier · on April 12, 2015

Interesting, sure. But I started out wondering if some obviously-noise picture could be found that fooled humans, at least at first glance. "Hey a panda! Wait, what was I thinking, that's just noise!" It would be weird and cool, on the order of the dress meme etc. but much more so.

Kind of like the memes in Snowcrash, ancient forgotten symbols that make up the kernel of human thought.

irascible · on March 24, 2015

Like a picture of a blue and black dress?

jeffclune · on March 23, 2015

They would look like optical illusions!

joe_the_user · on March 24, 2015

Can they create human equivalents?

That would require that human vision works in the fashion of neural networks/svms/etc. There actually isn't any evidence for this.

JoeAltmaier · on March 24, 2015

What?! Neural networks are explicitly designed to work in the fashion of - you guessed it, neural networks. Like, retina and optic nerve.

joe_the_user · on March 24, 2015

"In modern software implementations of artificial neural networks, the approach inspired by biology has been largely abandoned for a more practical approach based on statistics and signal processing." [1]

[1] http://en.wikipedia.org/wiki/Artificial_neural_network

JoeAltmaier · on March 25, 2015

The embodiment is changed of course. But the process of successive ranks of weighted accumulators has not been. Which is the neural model. Of course its different. But there's still the question of, could failure modes of the mathematical model be present in the biological one? Its a question of modeling, not wetware vs hardware.

joe_the_user · on March 25, 2015

But that model is not the way the overall activity of the brain is currently understood - neurons are seen as being far more complex than simple threshold machines. They may involve thresh effects but the claim of them working overall like any version of artificial neural works is no longer supported by anyone.

jeffclune · on March 23, 2015

Ha. Jason and I were typing at the same time. :-)

oh_sigh · on March 24, 2015

Of course, this requires that the attacker have complete knowledge of all the weights of the given network.

jeffclune · on March 24, 2015

No, it does not. See my reply to similar comments.