> we think we are recapitulating biology when in fact we are doing nothing of the sort (as these adversarial examples reveal beautifully).
I'm not sure I'd go so far. There's a pretty long list of optical illusions. Seeing motion where there clearly is none, not comparing distances correctly and most relevant here is things that look like a face. Here are a few selected famous examples: http://brainden.com/face-illusions.htm
Some of those immediately make my brain flag up "FACE". It's only looking in more detail that I see what else is there, but my visual system is clearly being tricked, as would billions of other completely independently grown visual systems. How much better could we do this, and with more subtlety, if we could analyse the whole brain like we can neural networks and target a specific brain?
There's an old experiment showing how a kitten raised never seeing horizontal lines will fail to see them ever after a certain age, so we know that biological systems struggle with limited visual input.
I'd also say we're doing matrix -> label conversions ourselves, too, unless we're born with a special geometric model. Deep learning also does things in layers, so there's not a direct matrix-label learning happening straight away, that should come much later after the system has learned to create a higher level representation of the input.
On a less contrarian side, I wonder how well these things would work if we were to show the networks videos of... everything. Years and years of video. Don't try and add labels yet, but can we add a constraint that we expect the representation to only change slowly? Two very similar frames should not result in the high-level interpretation changing drastically.
Additionally, introducing ancillary modules is not without cost-- you might gain robustness to some kinds of adversarial inputs at the expense of becoming vulnerable to others. There's plenty of ways to fool biological visual systems: c.f. magic-eye posters, optical illusions, or the various exploits described in Lettvin and Pitts' paper "What the Frog's Eye Tells the Frog's Brain".
I think that remains to be seen, at least in the general case, since we haven't yet agreed on a measure of performance. The debate around adversarial examples can be interpreted as arguing over the proper measure of performance. Although so far the debate is doing so somewhat implicitly, since afaik nobody has formalized a measure of robustness to adversarial examples; it's progressed more by case studies (which is fine, since research into NN robustness is still quite early stage, and case studies can help illustrate issues). I think it can be fairly said that neural nets perform well on the ImageNet benchmark and similar measures of performance. But whether those are good measures of performance, or whether some kind of metric that weights robustness more heavily should be used (and what methods would perform well on that) is the subject of current research, like this research.
When the adversarial examples for humans MATCH the adversarial examples for image classifiers, that would be evidence of having reproduced a biological system.
As for self-driving cars, this is a good argument for having multiple sensing modalities in addition to visual, such as radar/lidar/sonar, and multiple cameras, infrared in addition to visible light.
It's pretty obvious how to build translational symmetry into a net that's still expressive and easy to train (convolution). But you have to spoon feed CNNs rotational and other symmetries by augmenting the training data. What you really want is a model that has all the symmetries your data has built in.
My sense is that the community at large seems to regard DL as a magic blackbox which it really is not. Complete basis of function + finite data = guarantee of wonky interpolation between samples. What you really need to do is restrict the class of expressible functions to those you need - build your prior into the model.
Remember that these tricky images are based on the principle that machine-learning algorithms are differentiable and high-dimensional. There is a lot of ways to transition between, say, the desktop dimension and the cat dimension, and it's all continuous, so we're guaranteed to be able to influence the machine in that sort of direction.
You could imagine somehow taking all of the adversarial examples and categorically augmenting a machine's learning to know about the examples, creating a cat-masquerading-as-desktop dimension. But all you've done is make a lot more space (by adding a dimension) and so the next iteration of adversarial examples will be able to proceed by the same process as before, just on this new augmented machine.
By adding enough adversarial examples to the training set, you can absolutely immunize a model against adversarial perturbations of the training data.
The problem is that the volume of "not very different" data points surrounding an example grows exponentially with the input dimension, so you need to train for much longer, and your "adversarial protection" will likely overfit to the neighborhood of training examples, which doesn't help with unseen data.
That's basically the idea behind GANs.
The problem isn't that there are adversarial inputs. The problem is that the adversarial inputs aren't also adversarial (or detectable) to the human visual system.
It's an interesting question; maybe the reason for (some) of these adversarial vulnerabilities is due to a handful of bad training examples. You could formulate it as a search problem to see if there's particular images (or small groups of images) that are responsible for the adversarial vulnerabilities. This might then indicate that some of these perturbations are really just taking advantage of the fact that neural nets tend to "memorize" some of the data, so we're not really exploiting some deep structural feature so much as just feeding the echo of an input that the net has learned to automatically classify as, say, a computer/desk[0].
It would be a good project, but I don't have enough GPUs on hand to train scores of deep nets from scratch.
Assuming one were to bite the bullet, it might also be worth trying different data augmentation strategies. Most of the time, we try to eke out additional performance/robustness by using the same sets of transformations (translation, rotation, cropping, rescaling, etc.), but if the net is vulnerable to adversarial examples because of something in the training set, then you might just be making sure that adversarial vulnerability is present everywhere in the image and at multiple scales.
On a related note, there's an interesting paper about universal adversarial perturbations, i.e. those that can be added to any image and thereby induce a misclassification with high probability[1]. This effect holds even across different models, so the same perturbation can cause a misclassification in different architectures.
------
0. Neural nets learn by some combination of abstraction and memorization. If, for some reason, many members of a particular class are hard to generalize, then it's possible that they instead learn to identify some particular aspects of those classes (that are not usually present in other images) and have a disproportionate response when those features are present. If such features are not obvious to human visual inspection, then we get misclassifications without insight into why they were misclassified.
Yes, assuming we have 10000 different training images. Divide these into 5 sets of 2000 each and train 5 networks with them. Assuming that 2000 images are plenty for this application, we will have 5 well trained networks that have similar performance for a test set.
BUT
They will work slightly differently internally and those "inverse gradient search" methods (or what they are called) might only be able to manipulate an image for one network at the time with "specifically chosen additive noise" while the other 4 are unimpressed.
That's assuming that the manipulation can't be targeted at all 5 classifiers at the same time.
Once you figure out it's a picture(or photo) of something then figure out what it's a picture of.
Suppose you were to place an example like that on a stop sign that fooled a car into thinking that it was a tree. The car might blow through an intersection at speed as a result.
The training strategy they used provides a template for doing even more exotic manipulations. For example, you could train an adversarial example that looked like one thing when viewed from far away but something quite different up close. Placing an image like that by a road could result in an acute, unexpected change in the car's behavior (e.g. veering sharply to avoid a "person" that suddenly appeared).
I think these adversarial examples are near irrelevant issues for self driving cars. If someone does something bad, we prosecute them. Its the same whether you're throwing oil onto a highway, covering up stop signs with adversarial stop signs, or whatever you might want to do.
Now if there was an exploit that caused all self driving cars in the whole country to suddenly crash into walls, that would be one thing. But these image-based attacks are limited to a single intersection or road at time. And after a single car crashes, the intersection gets closed. So if you really want to kill a few people, why not just go and stab them in the neck?
Robust systems expect that some of the inferences can be mistaken (noisy). That's why you want to run multiple sensor types into different models, and use some kind of mixture of experts +/- probabilistic fusion.
Your idea is similar to an appeal to security through obscurity. Might work sometimes, but not generally.
(Noise does not help, because you can still discover a gradient to descend by averaging repeated trials.)