I was thinking the same thing until I scanned through the paper linked above. Wh...

drostie · on March 23, 2015

Right. The basic idea is something like the transition (manifolds) from doing special relativity to general relativity. The special "linear" says that given two inputs x and y to a function f, f is linear if f(x + y) = f(x) ⊕ f(y) for two operations +, ⊕. The general "linear" says that f(x + δx) = f(x) ⊕ δf(x, δx) for some small perturbations δx in the vicinity of x.

If x is a bit-vector then this can be as simple as saying "flip one bit of the input and here's how to predict which output bits get flipped." When you're building a hash function in cryptography, you try to push the algorithm towards a non-answer here: about half the bits should get flipped, and you shouldn't be able to predict which they are. But of course there's a security vulnerability even if + and ⊕ are not XORs.

Resisting "adversarial perturbation" in this context means basically that neural nets need to behave a bit more like hash functions, otherwise they will confuse the heck out of us. The problem is that if you just took the core lesson of hash functions -- create some sort of "round function" `r` so that the result is r(r(r(...r(x, 1)..., n - 2), n - 1), n) -- seems like it'd be really hard to invent learning algorithms to tune.