I was thinking the same thing until I scanned through the paper linked above. While neural networks are indeed non-linear, some NNs can still exhibit what amounts to linearity and suffer from adversarial linear perturbations. Example of linearity in NNs that the authors are considering from the paper:
>The linear view of adversarial examples suggests a fast way of generating them. We hypothesize
that neural networks are too linear to resist linear adversarial perturbation. LSTMs (Hochreiter &
Schmidhuber, 1997), ReLUs (Jarrett et al., 2009; Glorot et al., 2011), and maxout networks (Goodfellow
et al., 2013c) are all intentionally designed to behave in very linear ways, so that they are
easier to optimize. More nonlinear models such as sigmoid networks are carefully tuned to spend
most of their time in the non-saturating, more linear regime for the same reason. This linear behavior
suggests that cheap, analytical perturbations of a linear model should also damage neural networks.
Right. The basic idea is something like the transition (manifolds) from doing special relativity to general relativity. The special "linear" says that given two inputs x and y to a function f, f is linear if f(x + y) = f(x) ⊕ f(y) for two operations +, ⊕. The general "linear" says that f(x + δx) = f(x) ⊕ δf(x, δx) for some small perturbations δx in the vicinity of x.
If x is a bit-vector then this can be as simple as saying "flip one bit of the input and here's how to predict which output bits get flipped." When you're building a hash function in cryptography, you try to push the algorithm towards a non-answer here: about half the bits should get flipped, and you shouldn't be able to predict which they are. But of course there's a security vulnerability even if + and ⊕ are not XORs.
Resisting "adversarial perturbation" in this context means basically that neural nets need to behave a bit more like hash functions, otherwise they will confuse the heck out of us. The problem is that if you just took the core lesson of hash functions -- create some sort of "round function" `r` so that the result is r(r(r(...r(x, 1)..., n - 2), n - 1), n) -- seems like it'd be really hard to invent learning algorithms to tune.
>The linear view of adversarial examples suggests a fast way of generating them. We hypothesize that neural networks are too linear to resist linear adversarial perturbation. LSTMs (Hochreiter & Schmidhuber, 1997), ReLUs (Jarrett et al., 2009; Glorot et al., 2011), and maxout networks (Goodfellow et al., 2013c) are all intentionally designed to behave in very linear ways, so that they are easier to optimize. More nonlinear models such as sigmoid networks are carefully tuned to spend most of their time in the non-saturating, more linear regime for the same reason. This linear behavior suggests that cheap, analytical perturbations of a linear model should also damage neural networks.