Robust Adversarial Examples (opens in new tab)

(blog.openai.com)

166 pointseroo8y ago47 comments

47 comments

hyperion20108y ago

My own view of this having spent some time in visual neuroscience is that if you really want vision that is robust to these kinds of issues then you have to build a geometric representation of the world first, and then learn/map categories from that. Trying to jump from a matrix to a label without having an intervening topological/geometric model of the world in between (having 2 eyes and/or the ability to move and help with this) is asking for trouble because we think we are recapitulating biology when in fact we are doing nothing of the sort (as these adversarial examples reveal beautifully).

IanCal8y ago

Some disjointed thoughts.

> we think we are recapitulating biology when in fact we are doing nothing of the sort (as these adversarial examples reveal beautifully).

I'm not sure I'd go so far. There's a pretty long list of optical illusions. Seeing motion where there clearly is none, not comparing distances correctly and most relevant here is things that look like a face. Here are a few selected famous examples: http://brainden.com/face-illusions.htm

Some of those immediately make my brain flag up "FACE". It's only looking in more detail that I see what else is there, but my visual system is clearly being tricked, as would billions of other completely independently grown visual systems. How much better could we do this, and with more subtlety, if we could analyse the whole brain like we can neural networks and target a specific brain?

There's an old experiment showing how a kitten raised never seeing horizontal lines will fail to see them ever after a certain age, so we know that biological systems struggle with limited visual input.

I'd also say we're doing matrix -> label conversions ourselves, too, unless we're born with a special geometric model. Deep learning also does things in layers, so there's not a direct matrix-label learning happening straight away, that should come much later after the system has learned to create a higher level representation of the input.

On a less contrarian side, I wonder how well these things would work if we were to show the networks videos of... everything. Years and years of video. Don't try and add labels yet, but can we add a constraint that we expect the representation to only change slowly? Two very similar frames should not result in the high-level interpretation changing drastically.

larsiusprime8y ago

I think we should refrain from saying we are recapitulating biology until we have reached the point where the machine systems tend to succeed AND fail in the SAME ways that the biological systems do.

clickok8y ago

We tried that; the reason deep nets are popular is that they outperform geometric (or other problem-specific) models. This might be because they implicitly develop such representations somewhere along the way, or because such a representation is not really necessary for visual classification.

Additionally, introducing ancillary modules is not without cost-- you might gain robustness to some kinds of adversarial inputs at the expense of becoming vulnerable to others. There's plenty of ways to fool biological visual systems: c.f. magic-eye posters, optical illusions, or the various exploits described in Lettvin and Pitts' paper "What the Frog's Eye Tells the Frog's Brain".

mjn8y ago

> outperform

I think that remains to be seen, at least in the general case, since we haven't yet agreed on a measure of performance. The debate around adversarial examples can be interpreted as arguing over the proper measure of performance. Although so far the debate is doing so somewhat implicitly, since afaik nobody has formalized a measure of robustness to adversarial examples; it's progressed more by case studies (which is fine, since research into NN robustness is still quite early stage, and case studies can help illustrate issues). I think it can be fairly said that neural nets perform well on the ImageNet benchmark and similar measures of performance. But whether those are good measures of performance, or whether some kind of metric that weights robustness more heavily should be used (and what methods would perform well on that) is the subject of current research, like this research.

landon328y ago

have we tried using DL to construct 3d space, using either video or 3d images? That seems closer to what humans do than just the 2d images

arnioxux8y ago

There are plenty of adversarial examples for humans too: http://i.imgur.com/mOTHgnf.jpg

csomar8y ago

Is it though? The human correctly interpreted the image. The problem is that the image was well, not "real". Human have a limit of figuring out what is real and not real based on experience.

larsiusprime8y ago

I think the point is that these models are often hyped as being proof that we've reproduced human visual systems, and adversarial examples that humans can still resolve are evidence against that.

When the adversarial examples for humans MATCH the adversarial examples for image classifiers, that would be evidence of having reproduced a biological system.

mannykannot8y ago

Fair enough, but these sort of examples are not going to make the problem go away.

tachyonbeam8y ago

IMO, what these adversarial examples give us is a way to boost training data. We should augment training datasets with adversarial examples, or use adversarial training methods. The resulting networks would only be more robust as a result.

As for self-driving cars, this is a good argument for having multiple sensing modalities in addition to visual, such as radar/lidar/sonar, and multiple cameras, infrared in addition to visible light.

andbberger8y ago

But at what point do you have to wonder if we're using the wrong basis? And how do you know that augmenting the data with tiny adversarial perturbations won't just leave the network vulnerable in a different direction?

It's pretty obvious how to build translational symmetry into a net that's still expressive and easy to train (convolution). But you have to spoon feed CNNs rotational and other symmetries by augmenting the training data. What you really want is a model that has all the symmetries your data has built in.

My sense is that the community at large seems to regard DL as a magic blackbox which it really is not. Complete basis of function + finite data = guarantee of wonky interpolation between samples. What you really need to do is restrict the class of expressible functions to those you need - build your prior into the model.

azag08y ago

This is a huge topic in applying ML in physics and chemistry where we already have a lot of prior detailed knwoledge about the systems we want to describe and it would be silly not to build it into the ML models.

yakult8y ago

What's the current state of art in this direction? Is there a way to encode equations explicitly prior to training?

2 more replies

DonbunEf78y ago

You are, unfortunately, probably just playing out Mr. Crab's obsession with record players.

Remember that these tricky images are based on the principle that machine-learning algorithms are differentiable and high-dimensional. There is a lot of ways to transition between, say, the desktop dimension and the cat dimension, and it's all continuous, so we're guaranteed to be able to influence the machine in that sort of direction.

You could imagine somehow taking all of the adversarial examples and categorically augmenting a machine's learning to know about the examples, creating a cat-masquerading-as-desktop dimension. But all you've done is make a lot more space (by adding a dimension) and so the next iteration of adversarial examples will be able to proceed by the same process as before, just on this new augmented machine.

yorwba8y ago

But we don't really care about the cat-masquerading-as-desktop category in itself, so an adversarial example that makes a cat look like a cat-masquerading-as-desktop, or masquerades a cat-masquerading-as-desktop as a cat, isn't really relevant.

By adding enough adversarial examples to the training set, you can absolutely immunize a model against adversarial perturbations of the training data.

The problem is that the volume of "not very different" data points surrounding an example grows exponentially with the input dimension, so you need to train for much longer, and your "adversarial protection" will likely overfit to the neighborhood of training examples, which doesn't help with unseen data.

mannykannot8y ago

We care about the existence of a nontrivial set of images that demonstrate a troubling lack of robustness in image classifiers, at least until we we have good reason to say with confidence that such failures will not be a problem in practice.

kevinnk8y ago

> IMO, what these adversarial examples give us is a way to boost training data.

That's basically the idea behind GANs.

bsder8y ago

I can paint a road to a tunnel on a mountain side and fool some amount of people. Meep. Meep.

The problem isn't that there are adversarial inputs. The problem is that the adversarial inputs aren't also adversarial (or detectable) to the human visual system.

std_throwaway8y ago

Does this effect carry over to classifiers which were trained with different training data?

clickok8y ago

I am unsure what you mean-- do you mean with different training sets but the same testing set?

It's an interesting question; maybe the reason for (some) of these adversarial vulnerabilities is due to a handful of bad training examples. You could formulate it as a search problem to see if there's particular images (or small groups of images) that are responsible for the adversarial vulnerabilities. This might then indicate that some of these perturbations are really just taking advantage of the fact that neural nets tend to "memorize" some of the data, so we're not really exploiting some deep structural feature so much as just feeding the echo of an input that the net has learned to automatically classify as, say, a computer/desk[0].

It would be a good project, but I don't have enough GPUs on hand to train scores of deep nets from scratch.

Assuming one were to bite the bullet, it might also be worth trying different data augmentation strategies. Most of the time, we try to eke out additional performance/robustness by using the same sets of transformations (translation, rotation, cropping, rescaling, etc.), but if the net is vulnerable to adversarial examples because of something in the training set, then you might just be making sure that adversarial vulnerability is present everywhere in the image and at multiple scales.

On a related note, there's an interesting paper about universal adversarial perturbations, i.e. those that can be added to any image and thereby induce a misclassification with high probability[1]. This effect holds even across different models, so the same perturbation can cause a misclassification in different architectures.

------

0. Neural nets learn by some combination of abstraction and memorization. If, for some reason, many members of a particular class are hard to generalize, then it's possible that they instead learn to identify some particular aspects of those classes (that are not usually present in other images) and have a disproportionate response when those features are present. If such features are not obvious to human visual inspection, then we get misclassifications without insight into why they were misclassified.

1. https://arxiv.org/pdf/1610.08401.pdf

std_throwaway8y ago

> I am unsure what you mean-- do you mean with different training sets but the same testing set?

Yes, assuming we have 10000 different training images. Divide these into 5 sets of 2000 each and train 5 networks with them. Assuming that 2000 images are plenty for this application, we will have 5 well trained networks that have similar performance for a test set.

BUT

They will work slightly differently internally and those "inverse gradient search" methods (or what they are called) might only be able to manipulate an image for one network at the time with "specifically chosen additive noise" while the other 4 are unimpressed.

That's assuming that the manipulation can't be targeted at all 5 classifiers at the same time.

pvillano8y ago

I don't know how you guys think this is an adversarial example. I see a picture of a desktop computer.

sharemywin8y ago

To me it's an image of a picture regardless of the contents of the picture.

mannykannot8y ago

A classifier that implements that logic is not going to be useful for anything.

sharemywin8y ago

It's sure as hell not a cat. cat's aren't flat and don't have white borders. Well I guess it could be road kill.

Once you figure out it's a picture(or photo) of something then figure out what it's a picture of.

mring336218y ago

NOT PICTURE

therajiv8y ago

It's not clear to me how malicious actors can manipulate this observation to confuse self-driving cars. That said, I don't think this discredits the point of the article; it's important to note how easily deep learning models can be fooled if you understand the math behind them. I just think the example of tricking self-driving cars is difficult to relate with / understand.

skishore8y ago

Why do you say that? The first demo they provide shows that the adversarial image, when printed and then manipulated, still fools the algorithm. That means that the example is robust to various affine transformations but also to the per-pixel noise that is a result of a printing something and then viewing it again through a camera.

Suppose you were to place an example like that on a stop sign that fooled a car into thinking that it was a tree. The car might blow through an intersection at speed as a result.

The training strategy they used provides a template for doing even more exotic manipulations. For example, you could train an adversarial example that looked like one thing when viewed from far away but something quite different up close. Placing an image like that by a road could result in an acute, unexpected change in the car's behavior (e.g. veering sharply to avoid a "person" that suddenly appeared).

therajiv8y ago

You provide great examples, thanks. I guess I was just hoping that the article would spell out those situations as clearly as you did.

nullc8y ago

Though I generally agree with your point, the tree vs stopsign example may not be the best because it would arguably work equally well on humans.

netinstructions8y ago

Did the perturbed image of the cat in the article look like a desktop computer to you?

The point is that humans would see one thing whereas computers would be highly confident it is something else.

SmallDeadGuy8y ago

Only if the adversarial image printed doesn't look like the stop sign, though the example in this article shows that it's entirely possible to make an image that just looks like a distorted/badly-printed kitten to a human but completely different to a computer. A similar image for a stop sign might just look like wear in the paint or weird reflections or something but still look like a stop sign to a human.

1 more reply

0xbear8y ago

You could wear special adversarial clothing for example, or even just project adversarial images onto pavement, walls, poles, road signs, and other reflective surfaces.

randyrand8y ago

you could also throw nails onto a highway out your car window. I'm not sure why someone would though.

halomru8y ago

Or throw rocks from bridges into highway traffic. Teenagers occasionally do that

randyrand8y ago

Why limit yourself to self driving cars? A smart malicious actor would just throw oil out their window on a highway. Watch all the cars crash!!

I think these adversarial examples are near irrelevant issues for self driving cars. If someone does something bad, we prosecute them. Its the same whether you're throwing oil onto a highway, covering up stop signs with adversarial stop signs, or whatever you might want to do.

Now if there was an exploit that caused all self driving cars in the whole country to suddenly crash into walls, that would be one thing. But these image-based attacks are limited to a single intersection or road at time. And after a single car crashes, the intersection gets closed. So if you really want to kill a few people, why not just go and stab them in the neck?

RoboTeddy8y ago

Or could make a brick wall look like a flying plastic bag!

0xbear8y ago

That's another fundamental problem in itself: your car doesn't have much reasoning ability or knowledge of the world, so it can't tell if it's a flying plastic bag or a large boulder.

mamp8y ago

I doubt self driving cars would rely on a single network on a single image source (I hope not!).

Robust systems expect that some of the inferences can be mistaken (noisy). That's why you want to run multiple sensor types into different models, and use some kind of mixture of experts +/- probabilistic fusion.

sagebird8y ago

It doesn't matter how many algorithms or sensors are consulted or combined to form judgment. If an attacker can obtain a self driving vehicle's hardware, and if enough tests can be performed per seconds, the attacker can train images that fool it.

Your idea is similar to an appeal to security through obscurity. Might work sometimes, but not generally.

(Noise does not help, because you can still discover a gradient to descend by averaging repeated trials.)

mamp8y ago

You assume that all sensors are working on 'images' and that the algorithm are all using gradient descent.

1 more reply

j / k navigate · click thread line to collapse

47 comments

hyperion20108y ago

IanCal8y ago

Some disjointed thoughts.

> we think we are recapitulating biology when in fact we are doing nothing of the sort (as these adversarial examples reveal beautifully).

larsiusprime8y ago

I think we should refrain from saying we are recapitulating biology until we have reached the point where the machine systems tend to succeed AND fail in the SAME ways that the biological systems do.

clickok8y ago

mjn8y ago

> outperform

landon328y ago

have we tried using DL to construct 3d space, using either video or 3d images? That seems closer to what humans do than just the 2d images

arnioxux8y ago

There are plenty of adversarial examples for humans too: http://i.imgur.com/mOTHgnf.jpg

csomar8y ago

Is it though? The human correctly interpreted the image. The problem is that the image was well, not "real". Human have a limit of figuring out what is real and not real based on experience.

larsiusprime8y ago

I think the point is that these models are often hyped as being proof that we've reproduced human visual systems, and adversarial examples that humans can still resolve are evidence against that.

When the adversarial examples for humans MATCH the adversarial examples for image classifiers, that would be evidence of having reproduced a biological system.

mannykannot8y ago

Fair enough, but these sort of examples are not going to make the problem go away.

tachyonbeam8y ago

As for self-driving cars, this is a good argument for having multiple sensing modalities in addition to visual, such as radar/lidar/sonar, and multiple cameras, infrared in addition to visible light.

andbberger8y ago

azag08y ago

yakult8y ago

What's the current state of art in this direction? Is there a way to encode equations explicitly prior to training?

2 more replies

DonbunEf78y ago

You are, unfortunately, probably just playing out Mr. Crab's obsession with record players.

yorwba8y ago

By adding enough adversarial examples to the training set, you can absolutely immunize a model against adversarial perturbations of the training data.

mannykannot8y ago

kevinnk8y ago

> IMO, what these adversarial examples give us is a way to boost training data.

That's basically the idea behind GANs.

bsder8y ago

I can paint a road to a tunnel on a mountain side and fool some amount of people. Meep. Meep.

The problem isn't that there are adversarial inputs. The problem is that the adversarial inputs aren't also adversarial (or detectable) to the human visual system.

std_throwaway8y ago

Does this effect carry over to classifiers which were trained with different training data?

clickok8y ago

I am unsure what you mean-- do you mean with different training sets but the same testing set?

It would be a good project, but I don't have enough GPUs on hand to train scores of deep nets from scratch.

------

1. https://arxiv.org/pdf/1610.08401.pdf

std_throwaway8y ago

> I am unsure what you mean-- do you mean with different training sets but the same testing set?

BUT

That's assuming that the manipulation can't be targeted at all 5 classifiers at the same time.

pvillano8y ago

I don't know how you guys think this is an adversarial example. I see a picture of a desktop computer.

sharemywin8y ago

To me it's an image of a picture regardless of the contents of the picture.

mannykannot8y ago

A classifier that implements that logic is not going to be useful for anything.

sharemywin8y ago

It's sure as hell not a cat. cat's aren't flat and don't have white borders. Well I guess it could be road kill.

Once you figure out it's a picture(or photo) of something then figure out what it's a picture of.

mring336218y ago

NOT PICTURE

therajiv8y ago

skishore8y ago

Suppose you were to place an example like that on a stop sign that fooled a car into thinking that it was a tree. The car might blow through an intersection at speed as a result.

therajiv8y ago

You provide great examples, thanks. I guess I was just hoping that the article would spell out those situations as clearly as you did.

nullc8y ago

Though I generally agree with your point, the tree vs stopsign example may not be the best because it would arguably work equally well on humans.

netinstructions8y ago

Did the perturbed image of the cat in the article look like a desktop computer to you?

The point is that humans would see one thing whereas computers would be highly confident it is something else.

SmallDeadGuy8y ago

1 more reply

0xbear8y ago

You could wear special adversarial clothing for example, or even just project adversarial images onto pavement, walls, poles, road signs, and other reflective surfaces.

randyrand8y ago

you could also throw nails onto a highway out your car window. I'm not sure why someone would though.

halomru8y ago

Or throw rocks from bridges into highway traffic. Teenagers occasionally do that

randyrand8y ago

Why limit yourself to self driving cars? A smart malicious actor would just throw oil out their window on a highway. Watch all the cars crash!!

RoboTeddy8y ago

Or could make a brick wall look like a flying plastic bag!

0xbear8y ago

That's another fundamental problem in itself: your car doesn't have much reasoning ability or knowledge of the world, so it can't tell if it's a flying plastic bag or a large boulder.

mamp8y ago

I doubt self driving cars would rely on a single network on a single image source (I hope not!).

sagebird8y ago

Your idea is similar to an appeal to security through obscurity. Might work sometimes, but not generally.

(Noise does not help, because you can still discover a gradient to descend by averaging repeated trials.)

mamp8y ago

You assume that all sensors are working on 'images' and that the algorithm are all using gradient descent.

1 more reply

j / k navigate · click thread line to collapse