"Secondly, this doesn't account for ensembling: "fool me once, shame on you. Foo...

compbio · on March 23, 2015

Now I have to try this out. My intuition tells me it becomes increasingly hard to create a fooling image, which looks alien, and is able to fool all the nets in the ensemble, even though they have different settings and params. I think they can only fool one net at a time, and have to get very lucky to be able to evolve the image for the other nets, while keeping the same classification. You can't "train" these images on all nets at once, by simply treating the ensemble output as a single net.

If your intuition is right though, then the ensemble may be able to counter with a random selection of nets for its vote: You'd need to evolve images for every possible combination and/or account for nets added in the future.

Houshalter · on March 23, 2015

Overfitting has nothing to do with it. See this paper: http://arxiv.org/abs/1412.6572

I believe the original paper tried ensembles and even got the images to work on different networks.

compbio · on March 23, 2015

I was not talking about overfitting. I've seen that paper.

The original paper asked if images that could fool DBN.a could fool DBN.b. The answer was: certainly not all the time. They used the exact same train set and architecture for DBN.a and DBN.b, just randomly varied initial weights. I think this is too favorable for a comparison with a voting ensemble made with nets with a different architecture, train set and tuning. Can they also find images that can fool DBN.a-z?

Also, to test if a net can learn to recognize these fooling images, they simply add them to the train sets. Those noisy images would be far simpler to detect: They have a much greater complexity than natural images. To detect the artsy images, a quick knearest-neighbors run should show that they do not look much like anything it has seen before, so it may be an adversarial image.

Houshalter · on March 23, 2015

To be clear I meant this paper (http://arxiv.org/abs/1312.6199) as the original paper for adversarial images. I think they did try transferring them between very different NNs:

>In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

compbio · on March 23, 2015

Very interesting. Thank you for correcting me.

a relatively large fraction of examples will be misclassified by networks trained from scratch with different hyper-parameters (number of layers, regularization or initial weights). The above observations suggest that adversarial examples are somewhat universal...