Basically neural networks and many other machine learning methods are highly linear and continuous. So changing an input just slightly should change the output just slightly. If you change all of the inputs slightly in just the right directions, you can manipulate the output arbitrarily.
These images are highly optimized for this effect and unlikely to occur by random chance. Adding random noise to images doesn't seem to cause it, because for every pixel changed in the right direction, another is changed in the wrong direction.
The researchers found a quick method of generating these images, and found that training on them improved the net a lot. Not just on the adversarial examples.
I was thinking the same thing until I scanned through the paper linked above. While neural networks are indeed non-linear, some NNs can still exhibit what amounts to linearity and suffer from adversarial linear perturbations. Example of linearity in NNs that the authors are considering from the paper:
>The linear view of adversarial examples suggests a fast way of generating them. We hypothesize
that neural networks are too linear to resist linear adversarial perturbation. LSTMs (Hochreiter &
Schmidhuber, 1997), ReLUs (Jarrett et al., 2009; Glorot et al., 2011), and maxout networks (Goodfellow
et al., 2013c) are all intentionally designed to behave in very linear ways, so that they are
easier to optimize. More nonlinear models such as sigmoid networks are carefully tuned to spend
most of their time in the non-saturating, more linear regime for the same reason. This linear behavior
suggests that cheap, analytical perturbations of a linear model should also damage neural networks.
Right. The basic idea is something like the transition (manifolds) from doing special relativity to general relativity. The special "linear" says that given two inputs x and y to a function f, f is linear if f(x + y) = f(x) ⊕ f(y) for two operations +, ⊕. The general "linear" says that f(x + δx) = f(x) ⊕ δf(x, δx) for some small perturbations δx in the vicinity of x.
If x is a bit-vector then this can be as simple as saying "flip one bit of the input and here's how to predict which output bits get flipped." When you're building a hash function in cryptography, you try to push the algorithm towards a non-answer here: about half the bits should get flipped, and you shouldn't be able to predict which they are. But of course there's a security vulnerability even if + and ⊕ are not XORs.
Resisting "adversarial perturbation" in this context means basically that neural nets need to behave a bit more like hash functions, otherwise they will confuse the heck out of us. The problem is that if you just took the core lesson of hash functions -- create some sort of "round function" `r` so that the result is r(r(r(...r(x, 1)..., n - 2), n - 1), n) -- seems like it'd be really hard to invent learning algorithms to tune.
Most image recognition neural networks use Relu activations which is just a linear output unless the input is below zero. Even sigmoids/tanh are most linear in the middle region, and weight penalties are used to keep the weights small so they stay in that region.
But it doesn't really matter what activation function you use. The paper argues that its the linear layers between the nonlinearities that are the problem.
Everything non-linear is linear to a first-order Taylor expansion. Hence why you can evolve small perturbations, the same thing is used for explicit numerical integration of non-linear equations.
Nobody is claiming that this changes are more than astronomically possible to be produced by random processes, the title in such case should have read "... raises robustness concerns" instead of security concerns.
In every security system, it is assumed that if there is an attack surface, sooner or later an intelligent adversary will come and exploit it. And there is a long precedent saying that if you see the word "linear" anywhere in the attack surface description, the adversary is bond to come sooner rather than later.
You say that these images are highly optimized to produce this effect and would not occur by chance, but have you looked at the images in the "fooling" paper?
Some of them are very simple, and DO occur a lot in the world. For example, the alternating yellow and black line pattern would be encountered by a driverless car, and it would think it is seeing a school bus.
>Some of them are very simple, and DO occur a lot in the world. For example, the alternating yellow and black line pattern would be encountered by a driverless car, and it would think it is seeing a school bus.
While the image shows a yellow and black line pattern to us, are you sure this is also what the CNN "sees"? Couldn't this image just be the same as the adversarial images, i.e. it responds to many small input values rather than the overall pattern?
If it's possible to make the CNN predict an ostrich for an image of a car, then the same can be done of an image of an alternating yellow and black line pattern, no?
Basically neural networks and many other machine learning methods are highly linear and continuous. So changing an input just slightly should change the output just slightly. If you change all of the inputs slightly in just the right directions, you can manipulate the output arbitrarily.
These images are highly optimized for this effect and unlikely to occur by random chance. Adding random noise to images doesn't seem to cause it, because for every pixel changed in the right direction, another is changed in the wrong direction.
The researchers found a quick method of generating these images, and found that training on them improved the net a lot. Not just on the adversarial examples.