What a Deep Neural Network thinks about selfies (opens in new tab)

(karpathy.github.io)

262 pointsvkhuc10y ago50 comments

50 comments

41 comments · 14 top-level

nightpool10y ago· 5 in thread

>Be female. Women are consistently ranked higher than men. In particular, notice that there is not a single guy in the top 100.

This sounds true, but it can't be the real reason—selfies are ranked relative to the other images by the same user. So unless users are taking a lot of #selfies of people of different genders, we can assume the dataset is already controlled for the gender of the person in the image, no? Unless there's some confounding factor at play, such as some demographic segment being more likely to optimize for good selfies occasionally but have boring feeds the rest of the time.

would be super interesting, if the data is available, to normalize this by exposure. Of the people that saw an image, how many clicked "like"?

the847210y ago

Well, one of the other factors is long hair and the tendency to oversaturate the face. Those factors don't seem independent to me, men are less likely to sport long hair and they're also less likely to oversature the face to measure up to some skin perfection standards (think of it as the photographic equivalent of makeup).

> but it can't be the real reason

Can't? Ontop of the above-listed aspects it is entirely possible that there is a bias that both sexes find female appearance somewhat more aesthetically pleasing.

Similar to how focus group testing for computer voices tends to result in female voices being chosen (at least that's what I often hear, couldn't find a solid source).

Even if the bias is small the correlated factors would amplify it when you're optimizing for a maximum, i.e. for the top selection.

nightpool10y ago

Neither of those explain why it would rank above the average of other female faces, in general.

Discussion about this with the author reveals that I was misinterpreting how they were collecting averages. I was assuming the "like" count was coming from each photo collected, but instead they collected the photos and average likes in individual steps, where the average likes were across recent posts by that user, rather then the selfies by that user.

1 more reply

visarga10y ago

> focus group testing for computer voices tends to result in female voices being chosen

I personally prefer the Alex voice from Mac OS to female voices. It has nice intonation. If only I could make it correct some of the mistakes it makes, for example not being able to distinguish "read" in past tense from "read" in present tense which makes it sound silly. Another error it makes is confusing "live" as in "live concert" with "live" as in "live in USA" (they are called heteronyms and are a special case in TTS).

1 more reply

lqdc1310y ago

Yeah, female users probably post more pictures and also probably have more friends.

nightpool10y ago

This also would be controlled for by the tools the blog author used though—if a women has more friends, then they would also probably get more likes on all of the rest of their images. Not sure if posting more photos would drive the average up or down, but it would probably drive the "above the baseline" selfies in the same way.

1 more reply

lqdc1310y ago· 4 in thread

A guide on how to take a good selfie that others will like:

  be female
  be blonde
  be attractive

Incidentally, Christian Rudder did a really good "study" on the dating site pictures a few years ago:

http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/

steve_taylor10y ago

A better guide on how to take a selfie:

    Don't take a selfie.

1 more reply

tdaltonc10y ago

And, if you are female, chop off you forehead.

gus_massa10y ago

Also, long hair in front of your shoulders (no ponytail).

bakhy10y ago

apparently, she should be white too.

misiti378010y ago· 3 in thread

One thing I always found interesting is Lecun is credited with developing covnets, but Hinton is apparently credited with scaling them and showing the world how great they are in the paper from 2012 - why was Hinton's group (Toronto) able to publish these ground breaking results before Lecun's group (NYU)

pramodliv110y ago

Geoff Hinton answers this question in episode 6 of the Talking Machines podcast. http://www.thetalkingmachines.com/blog/2015/3/13/how-machine...

Geoff Hinton had grad students who wanted to work on the problem, but Yann LeCun didn't.

"In about 2012, it should have been Yann's group, but Yann was unlucky, he didn't have a student who really wanted to do it. But we had a couple of students who wanted to do it and we took all of Yann's techniques and added some of our own."

misiti378010y ago

interesting - i took the course but did not notice that - thanks!

Houshalter10y ago

IIRC the deep learning revolution started with pretraining and RBMs, which I believe Hinton invented.

JabavuAdams10y ago· 3 in thread

How to take a good selfie: don't be black or dark-skinned, unless you're a celebrity.

How do we prevent our AIs from learning racism?

EDIT> Informative article, BTW. A good read.

Lawtonfogle10y ago

If a given question has an answer that is due to racism, the answer is still the answer. For example, if society has some underlying racism that factors into what it considers attractive, that doesn't change what it considers attractive.

I don't think these algorithms are learning racism. They are only being blunt in revealing what already exists.

JabavuAdams10y ago

> If a given question has an answer that is due to racism, the answer is still the answer.

That's why it's important to be clear about the question. This ConvNet doesn't really answer the question "What makes a good selfie". It answers a much narrower and more complicated to state question.

The absence of reflection in the system means that if it's used to answer a question that's superficially similar to the designer's intent, there's no way to reason around the bias in the training data.

Imagine I'm a Canadian who trains an automated turret to classify friend / foe based on data from Afghanistan and Iraq. I've not trained the system to answer "Is this group of pixels a friend / foe", in the general sense. If the system is used outside the narrow context of its validity, say in Northern Ireland, or in a civilian Muslim neighbourhood in Paris, we should expect bad results.

So you're right to point out that the racism is in the social context. But I'm arguing that we don't actually want a classifier to learn that if there's a good chance it'll be used in a way that discards or ignores that social context. Same as using an expert system outside its domain.

apu10y ago

This is an important point. People are thinking about it, and a lot of it will have to do with how the input data is gathered and curated.

netheril9610y ago· 3 in thread

One caveat with these machine inspired knowledge: they are prone to error, probably more than humans, at least for now.

For example, if you train a CNN directly with human faces, its recognition rate comes way below what a human is capable of. Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.

nl10y ago

This is more wrong than right.

The example is correct, but not for the reasons stated. Humans are very, very good at face recognition. However, CNNs are pretty close to human performance for face detection.

Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.

This just isn't the case. Take the GoogLeNet or VGGNet papers, build the CNN as described using Caffe/whatever, train as described in the paper and you'll end up with something that is pretty much on par with human performance for categorizing ImageNet images.

Take that same CNN architecture, and retrain it for another domain and it will perform roughly as well there too, for the task of categorizing into ~1K-10K image classes.

This isn't domain specific tuning. It's domain specific training, which is very different (although collecting the data is a big job).

Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability.

For CNNs, this is pretty much entirely false.

netheril9610y ago

A GoogleNet or VGGNet has tons of parameters. How many convolutional layers are stacked together, the size and stride of each one, where to put the dropout layers, where to put the full connection layers, how they are connected together, global learning rate and momentum and decay, local learning rate and momentum and decay, each of these myriad parameters have an unpredictable effect on the final result. The initialization of the network also has a major bearing on the final outcome. It is almost a chaotic system where nothing small can be safely ignored. One time my result of training a CNN was swung by the `batch_size` parameter and to this day I don't know how.

Those parameters are exactly the type of handcrafted optimizations I am talking about. You cannot just fill in arbitrary numbers and expect the network to fare well. In fact, you cannot even expect it to converge.

You can take those papers and build a world class classifier only because someone else has taken all the time to optimize for the specific case. Once you switch the task, the result will be OK, but nowhere close to what a human or a true AI would give you. Not until you take the time to optimize the parameters.

2 more replies

eivarv10y ago

What type of handcrafted optimizations are you talking about here?

The state of the art I've read about* (deep CNNs) in later years rely more on generalized tricks like augmenting the training data (artificially inflating the data set), pre-training and fine-tuning, ReLU, regularization methods like dropout, etc.

For anyone interested, here [1] are some benchmarks.

* Late night here, but often in the vein of this [0] work.

[0]: https://www.cs.toronto.edu/~ranzato/publications/taigman_cvp...

[1]: http://vis-www.cs.umass.edu/lfw/results.html

trhway10y ago· 3 in thread

looking at the top 100 one can only wonder how Hollywood has figured it out well before mighty power of computer :)

goodJobWalrus10y ago

For me, this thing about having the top of your head cut from the picture is new. Who would have thought..

falcolas10y ago

Makes a bit of sense, in combination with the "be female" advice, cutting off the forehead puts the center of the photograph closer to her cleavage, and typically shows off her entire chest.

2 more replies

mirimir10y ago

It seems that eyes and mouth, and their alignment, matter most for female attractiveness.[0,1]

[0] http://www.nbcnews.com/id/34482178/ns/health-skin_and_beauty...

[1] http://www.ncbi.nlm.nih.gov/pubmed/25836007

anunderachiever10y ago· 2 in thread

I would like to see a deep dream selfie ...

Feed it an initial picture (noise, clouds, a selfie) and then backwards manipulate the input to maximize the assessed quality of the "selfie".

I guess that would look pretty funny.

Tyr4210y ago

He did run something like that for cropping. He showed his favourite two "rude" ones at the bottom, where the 'Net cropped out the face of the person taking the selfie.

yoha10y ago

Actually, he used random crops and selected the highest rated. A "deep dream selfie" would actually run the neural network in reverse so as to generate a completely different image.

vonnik10y ago· 2 in thread

I think it's less about the head getting chopped than about having "the head take up about 1/3 of the image," as Karpathy says. So what the net is learning is composition, or balance in an image, which is really cool. The rule of thirds is actually pretty well know to people in photography:

https://en.wikipedia.org/wiki/Rule_of_thirds

(Our deep-learning framework http://deeplearning4j.org missed his list, but it's got working convnets, too.)

Jack00010y ago

possibly, but none of the cropped examples have cropped chins. It's also well known in photography that you can cut off someone's forehead, but never their chin.

pshc10y ago

Echoing a law of video games: "nobody looks up"!

danblick10y ago· 1 in thread

This is neat. I bet Facebook or OkCupid are sitting on all sorts of click data that could be used to develop tools for helping people make their photos look better. (Even if, personally, I can't wait for a cultural backlash against internet narcissism...)

[Edit: Even better, he didn't use click data to train the model, just public likes.]

visarga10y ago

The idea to use a convnet to reframe the selfie is neat. Makes it 5% better. Also, if it can be run on the phone, it could possible warn people they are about to post a shitty selfie before they do.

thewhitetulip10y ago· 1 in thread

Well, you don't need to ask a deep neural network to say that selfies are getting stupid daily with teens sticking their tongues out

visarga10y ago

BEEP BEEP. Bad selfie detected. You run the risk of making a fool of yourself! BEEP BEEP

RealityVoid10y ago

It seems this neural network has a sense of humor if you look at the "Finding the Optimal Crop for a selfie" area.

You can see it optimized the last selfie by cropping the face fully out of the picture.. :))

spikels10y ago

DNN is a key technology of the future. I highly recommend the education program Professor Karpathy mentions at the end of this post. All are excellent and free.

amai10y ago

I have seen similar results before: https://medium.com/the-physics-arxiv-blog/the-algorithm-that...

JoachimS10y ago

A really good read. Good intro to ConvNets, a well designed and implemented test. Ad funny.

j / k navigate · click thread line to collapse

50 comments

41 comments · 14 top-level

nightpool10y ago· 5 in thread

>Be female. Women are consistently ranked higher than men. In particular, notice that there is not a single guy in the top 100.

would be super interesting, if the data is available, to normalize this by exposure. Of the people that saw an image, how many clicked "like"?

the847210y ago

> but it can't be the real reason

Can't? Ontop of the above-listed aspects it is entirely possible that there is a bias that both sexes find female appearance somewhat more aesthetically pleasing.

Similar to how focus group testing for computer voices tends to result in female voices being chosen (at least that's what I often hear, couldn't find a solid source).

Even if the bias is small the correlated factors would amplify it when you're optimizing for a maximum, i.e. for the top selection.

nightpool10y ago

Neither of those explain why it would rank above the average of other female faces, in general.

1 more reply

visarga10y ago

> focus group testing for computer voices tends to result in female voices being chosen

1 more reply

lqdc1310y ago

Yeah, female users probably post more pictures and also probably have more friends.

nightpool10y ago

1 more reply

lqdc1310y ago· 4 in thread

A guide on how to take a good selfie that others will like:

  be female
  be blonde
  be attractive

Incidentally, Christian Rudder did a really good "study" on the dating site pictures a few years ago:

http://blog.okcupid.com/index.php/dont-be-ugly-by-accident/

steve_taylor10y ago

A better guide on how to take a selfie:

    Don't take a selfie.

1 more reply

tdaltonc10y ago

And, if you are female, chop off you forehead.

gus_massa10y ago

Also, long hair in front of your shoulders (no ponytail).

bakhy10y ago

apparently, she should be white too.

misiti378010y ago· 3 in thread

pramodliv110y ago

Geoff Hinton answers this question in episode 6 of the Talking Machines podcast. http://www.thetalkingmachines.com/blog/2015/3/13/how-machine...

Geoff Hinton had grad students who wanted to work on the problem, but Yann LeCun didn't.

misiti378010y ago

interesting - i took the course but did not notice that - thanks!

Houshalter10y ago

IIRC the deep learning revolution started with pretraining and RBMs, which I believe Hinton invented.

JabavuAdams10y ago· 3 in thread

How to take a good selfie: don't be black or dark-skinned, unless you're a celebrity.

How do we prevent our AIs from learning racism?

EDIT> Informative article, BTW. A good read.

Lawtonfogle10y ago

I don't think these algorithms are learning racism. They are only being blunt in revealing what already exists.

JabavuAdams10y ago

> If a given question has an answer that is due to racism, the answer is still the answer.

apu10y ago

This is an important point. People are thinking about it, and a lot of it will have to do with how the input data is gathered and curated.

netheril9610y ago· 3 in thread

One caveat with these machine inspired knowledge: they are prone to error, probably more than humans, at least for now.

nl10y ago

This is more wrong than right.

The example is correct, but not for the reasons stated. Humans are very, very good at face recognition. However, CNNs are pretty close to human performance for face detection.

Take that same CNN architecture, and retrain it for another domain and it will perform roughly as well there too, for the task of categorizing into ~1K-10K image classes.

This isn't domain specific tuning. It's domain specific training, which is very different (although collecting the data is a big job).

Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability.

For CNNs, this is pretty much entirely false.

netheril9610y ago

2 more replies

eivarv10y ago

What type of handcrafted optimizations are you talking about here?

For anyone interested, here [1] are some benchmarks.

* Late night here, but often in the vein of this [0] work.

[0]: https://www.cs.toronto.edu/~ranzato/publications/taigman_cvp...

[1]: http://vis-www.cs.umass.edu/lfw/results.html

trhway10y ago· 3 in thread

looking at the top 100 one can only wonder how Hollywood has figured it out well before mighty power of computer :)

goodJobWalrus10y ago

For me, this thing about having the top of your head cut from the picture is new. Who would have thought..

falcolas10y ago

Makes a bit of sense, in combination with the "be female" advice, cutting off the forehead puts the center of the photograph closer to her cleavage, and typically shows off her entire chest.

2 more replies

mirimir10y ago

It seems that eyes and mouth, and their alignment, matter most for female attractiveness.[0,1]

[0] http://www.nbcnews.com/id/34482178/ns/health-skin_and_beauty...

[1] http://www.ncbi.nlm.nih.gov/pubmed/25836007

anunderachiever10y ago· 2 in thread

I would like to see a deep dream selfie ...

Feed it an initial picture (noise, clouds, a selfie) and then backwards manipulate the input to maximize the assessed quality of the "selfie".

I guess that would look pretty funny.

Tyr4210y ago

He did run something like that for cropping. He showed his favourite two "rude" ones at the bottom, where the 'Net cropped out the face of the person taking the selfie.

yoha10y ago

Actually, he used random crops and selected the highest rated. A "deep dream selfie" would actually run the neural network in reverse so as to generate a completely different image.

vonnik10y ago· 2 in thread

https://en.wikipedia.org/wiki/Rule_of_thirds

(Our deep-learning framework http://deeplearning4j.org missed his list, but it's got working convnets, too.)

Jack00010y ago

possibly, but none of the cropped examples have cropped chins. It's also well known in photography that you can cut off someone's forehead, but never their chin.

pshc10y ago

Echoing a law of video games: "nobody looks up"!

danblick10y ago· 1 in thread

[Edit: Even better, he didn't use click data to train the model, just public likes.]

visarga10y ago

The idea to use a convnet to reframe the selfie is neat. Makes it 5% better. Also, if it can be run on the phone, it could possible warn people they are about to post a shitty selfie before they do.

thewhitetulip10y ago· 1 in thread

Well, you don't need to ask a deep neural network to say that selfies are getting stupid daily with teens sticking their tongues out

visarga10y ago

BEEP BEEP. Bad selfie detected. You run the risk of making a fool of yourself! BEEP BEEP

RealityVoid10y ago

It seems this neural network has a sense of humor if you look at the "Finding the Optimal Crop for a selfie" area.

You can see it optimized the last selfie by cropping the face fully out of the picture.. :))

spikels10y ago

DNN is a key technology of the future. I highly recommend the education program Professor Karpathy mentions at the end of this post. All are excellent and free.

amai10y ago

I have seen similar results before: https://medium.com/the-physics-arxiv-blog/the-algorithm-that...

JoachimS10y ago

A really good read. Good intro to ConvNets, a well designed and implemented test. Ad funny.

j / k navigate · click thread line to collapse