Identifying the dimensions used by the primate brain to decode faces (opens in new tab)

(nytimes.com)

197 pointspk22009y ago39 comments

39 comments

29 comments · 11 top-level

iandanforth9y ago· 7 in thread

Key sentence - "the correct choice of face space axes is critical for achieving a simple explanation of face cells’ responses."

They did PCA over two sets of metrics, taking the top 25 components from each set and then combined that into a 50d space. Using these dimensions and measured responses to fit a model resulted in explaining 57% of variance in real cell firing rates. (Much better than other models including a 5 layer CNN).

This is pretty cool. I'd like to see a follow-up where the chosen dimensions were further refined using something a bit more iterative that an arbitrary PCA cutoff.

Also I really want to know what eye motion was present during each trial. This paper presents a very "instantaneous" recognition perspective and doesn't talk about integration over time or the impact of sequential perception of face components on recognition. (Eg an upside-down face is hard to recognize because your gaze has to move up from the eyes to see the mouth which is a sequence rarely encountered in the real world)

gumby9y ago

> Also I really want to know what eye motion was present during each trial.

Incredibly important point. Looking at the physiology of the eye and the early visual system all together it's not clear that we can see anything without motion: object motion or saccade (saccade appears to be fundamental). Start with Lettvin's famous frog's eye paper.

More interestingly, it does appear that recognition of a 2D image may be a complex, abstract learned behavor, while 3D recognition is at its core innate. And that writing (well, drawing) preceded reading.

stinos9y ago

See the methods: as usual for visual fixattion experiments with monkeys, the subjects were trained to maintain fixation, fixation window being only a part of the image size:

stimulus size spanned 5.7 degrees. The fixation spot size was 0.2 degrees in diameter and the fixation window was a square with the diameter of 2.5 degrees.

2.5 is relatively large in my experience, might be due to lots of noise on the signal, but anyway there won't be true saccades (well, there shouldn't be, parts of recorded data wehere with saccades outside of that fixation window should get rejected). Probably microsaccades were present, but those aren't mentioned.

highd9y ago

A 5 layer CNN is absurdly shallow, so this isn't particularly surprising. I routinely work with 150+ layer CNNs - that's fairly standard practice if you want high-quality results.

forgotpw11239y ago

This comment is absurd. "Your CNN sucks, I know this because I work with better ones all the time. (insert metric that doesn't mean much)"

Oh, BTW I do ML consulting lol

3 more replies

iandanforth9y ago

They only generated 2000 images to work with so I'm not sure any cnn can be expected to do terribly well.

1 more reply

nl9y ago

5 layers is AlexNet, which is a decent starting point for visual tasks.

halflings9y ago

Since when are 150+ layer CNNs fairly standard?

Inception-v3 has about 50 layers, and that's considered a lot (requires considerable processing power to train).

1 more reply

pk2200OP9y ago· 3 in thread

Here's the paper: http://www.cell.com/cell/pdf/S0092-8674(17)30538-X.pdf

paulfrancisco9y ago

All the faces seem computer generated. It would have been nice if they had used celebrity faces we can all recognize and see what their system comes up with.

iandanforth9y ago

They are real faces that are morphed along one of the 50 axes they came up with. They look less real because it's standard practice to exclude hair and shoulders and backgrounds from face databases (even if this is totally unrealistic it helps isolated the system under study)

rangibaby9y ago

You can see some of the faces your brain comes up with here: https://youtu.be/VT9i99D_9gI

ragebol9y ago· 3 in thread

If I understand this correctly, this works similar to an embedding in e.g. deep learning: faces are represented by high-dimensional vectors.

Reading the Cell article on this [0] I couldn't help to see the similarities with OpenFace [1].

[0] http://www.cell.com/cell/fulltext/S0092-8674(17)30538-X [1] https://cmusatyalab.github.io/openface/#overview

supernumerary9y ago

Interestingly in the article, they try to differentiate the system from machine learning:

Advances in machine learning have been made by training a computerized mimic of a neural network on a given task. Though the networks are successful, they are also a black box because it is hard to reconstruct how they achieve their result.

“This has given neuroscience a sense of pessimism that the brain is similarly a black box,” she said. “Our paper provides a counterexample. We’re recording from neurons at the highest stage of the visual system and can see that there’s no black box. My bet is that that will be true throughout the brain.”

reader50009y ago

I believe it is the case that deep NNs at the highest/deepest levels have interpretable feature-specific cells as well. Not sure what they think the difference is.

taneq9y ago

At this stage I don't think "black box" is a very fair description. We now understand a fair bit about how artificial neural nets encode information and calculate things. And what we don't understand, we can still see and study the processes.

1 more reply

gech9y ago· 2 in thread

Can this model be translated into computer vision code? I always wonder if it means there are new more efficient models still to be found to copy from nature, or if the model ends up not being the most efficient and just the result of evolution.

ragebol9y ago

Sort-of: https://cmusatyalab.github.io/openface/#overview

visarga9y ago

It's not more efficient than CNNs - actually, CNNs are one of the most successful ways to do neural nets. Yann LeCun should be proud.

curun1r9y ago· 1 in thread

I remember reading about a study [1] that showed that humans recognize faces based on how similar they are to the faces of their parents. It's well known that humans are more easily able to differentiate faces within our own races. But what the study did was look at people who were adopted by parents of a different race. Those people were more easily able to differentiate faces of people of the same race as their adopted parents and had difficulty differentiating faces of people of their own race. The inference is that we actually store/recognize facial deltas, not full facial images.

I'm curious how this study would explain or contradict the results of that study. Also, were the monkeys raised by monkey parents or human scientists? Monkeys that were allowed to imprint on humans might be more similar to humans and, yet, unrepresentative of monkeys.

[1] I think it was https://www.ncbi.nlm.nih.gov/pubmed/15943669

anothercomment9y ago

Hm, I always guessed that the brain simply goes by a kind of "maximum information gain" approach, picking out the features that tend stick out most to distinguish things. If unused to, skin color sticks out extremely, so the signal it generates would drown out the other signals.

But that is just a guess and might also go against the findings of the article, which I haven't fully understood yet.

In any case it seems to me that the brain is optimized for processing certain features, as apparently there are people who are unable to recognize faces (face blindness).

daxfohl9y ago· 1 in thread

I'm skeptical. Like "faked results" skeptical. Crime witness studies show that most humans can't reproduce another human face that accurately. So-so when the face is at least of the same race, but when of a different race it's a coin flip as to whether they can even recognize it. (That said, I've only heard this on various TV shows, never seen actual research, so the presumption could be wrong). How can primates do so much better with an entirely different species? Or, not even primates, but some AI going through primate neural signals?

tmalsburg29y ago

Very good point, and yes there's a lot of research showing that humans are terrible at recognizing faces from other races. No idea why you're being downvoted.

mzitelli9y ago· 1 in thread

Amazing results, it is incredible to see that our brains do the same process as CNNs, encoding information using multiple layers of neurons to extract features. This makes me think that consciousness could be just an extreme high-level temporal representation of our own senses.

fatjokes9y ago

I'm not certain, but I don't think that's a coincidence. I.e. convnets were inspired by the neural visual system.

lexicality9y ago

"It is a remarkable advance to have identified the dimensions used by the primate brain to decode faces, he added — and impressive that the researchers were able to reconstruct from neural signals the face a monkey is looking at."

"These dimensions create a mental “face space” in which an infinite number of faces can be recognized. There is probably an average face, or something like it, at the origin, and the brain measures the deviation from this base."

"Dr. Tsao said she was particularly impressed to find she could design a whole series of faces that a given face cell would not respond to, because they lacked its preferred combination of dimensions. This ruled out a possible alternative method of face identification: that the face cells were comparing incoming images with a set of standard reference faces and looking for differences."

I'm surprised that they didn't attempt to generate a face with exactly 0 on all dimensions.

It would be fascinating to know what the most memorable face looks like - and if it's different per-brain. (Presumably it is monkey shaped!)

DanielleMolloy9y ago

This paper is highly exciting for anybody working on the neural code and so-called encoding models.

Only few days ago there was a similar study reading faces from human brains by trying to construct a latent space:

https://arxiv.org/abs/1705.07109

https://twitter.com/ccnlab/status/866548346751725568 (animation, over time more dimensions from this latent space (afaik PCA components) are added)

Given the limitations of fMRI (we can not do single cell recordings in human brains) the results are not as accurate, but to my knowledge this is the best we can do in humans so far.

folli9y ago

How are these macaques able to so finely differentiate faces of a different species? I'm pretty sure I wouldn't be able to differentiate many macaque faces from each other.

linux26479y ago

I wonder if this is related to why we recognize faces in nonliving objects such as cars.

j / k navigate · click thread line to collapse

39 comments

29 comments · 11 top-level

iandanforth9y ago· 7 in thread

Key sentence - "the correct choice of face space axes is critical for achieving a simple explanation of face cells’ responses."

This is pretty cool. I'd like to see a follow-up where the chosen dimensions were further refined using something a bit more iterative that an arbitrary PCA cutoff.

gumby9y ago

> Also I really want to know what eye motion was present during each trial.

stinos9y ago

See the methods: as usual for visual fixattion experiments with monkeys, the subjects were trained to maintain fixation, fixation window being only a part of the image size:

stimulus size spanned 5.7 degrees. The fixation spot size was 0.2 degrees in diameter and the fixation window was a square with the diameter of 2.5 degrees.

highd9y ago

A 5 layer CNN is absurdly shallow, so this isn't particularly surprising. I routinely work with 150+ layer CNNs - that's fairly standard practice if you want high-quality results.

forgotpw11239y ago

This comment is absurd. "Your CNN sucks, I know this because I work with better ones all the time. (insert metric that doesn't mean much)"

Oh, BTW I do ML consulting lol

3 more replies

iandanforth9y ago

They only generated 2000 images to work with so I'm not sure any cnn can be expected to do terribly well.

1 more reply

nl9y ago

5 layers is AlexNet, which is a decent starting point for visual tasks.

halflings9y ago

Since when are 150+ layer CNNs fairly standard?

Inception-v3 has about 50 layers, and that's considered a lot (requires considerable processing power to train).

1 more reply

pk2200OP9y ago· 3 in thread

Here's the paper: http://www.cell.com/cell/pdf/S0092-8674(17)30538-X.pdf

paulfrancisco9y ago

All the faces seem computer generated. It would have been nice if they had used celebrity faces we can all recognize and see what their system comes up with.

iandanforth9y ago

rangibaby9y ago

You can see some of the faces your brain comes up with here: https://youtu.be/VT9i99D_9gI

ragebol9y ago· 3 in thread

If I understand this correctly, this works similar to an embedding in e.g. deep learning: faces are represented by high-dimensional vectors.

Reading the Cell article on this [0] I couldn't help to see the similarities with OpenFace [1].

[0] http://www.cell.com/cell/fulltext/S0092-8674(17)30538-X [1] https://cmusatyalab.github.io/openface/#overview

supernumerary9y ago

Interestingly in the article, they try to differentiate the system from machine learning:

reader50009y ago

I believe it is the case that deep NNs at the highest/deepest levels have interpretable feature-specific cells as well. Not sure what they think the difference is.

taneq9y ago

1 more reply

gech9y ago· 2 in thread

ragebol9y ago

Sort-of: https://cmusatyalab.github.io/openface/#overview

visarga9y ago

It's not more efficient than CNNs - actually, CNNs are one of the most successful ways to do neural nets. Yann LeCun should be proud.

curun1r9y ago· 1 in thread

[1] I think it was https://www.ncbi.nlm.nih.gov/pubmed/15943669

anothercomment9y ago

But that is just a guess and might also go against the findings of the article, which I haven't fully understood yet.

In any case it seems to me that the brain is optimized for processing certain features, as apparently there are people who are unable to recognize faces (face blindness).

daxfohl9y ago· 1 in thread

tmalsburg29y ago

Very good point, and yes there's a lot of research showing that humans are terrible at recognizing faces from other races. No idea why you're being downvoted.

mzitelli9y ago· 1 in thread

fatjokes9y ago

I'm not certain, but I don't think that's a coincidence. I.e. convnets were inspired by the neural visual system.

lexicality9y ago

I'm surprised that they didn't attempt to generate a face with exactly 0 on all dimensions.

It would be fascinating to know what the most memorable face looks like - and if it's different per-brain. (Presumably it is monkey shaped!)

DanielleMolloy9y ago

This paper is highly exciting for anybody working on the neural code and so-called encoding models.

Only few days ago there was a similar study reading faces from human brains by trying to construct a latent space:

https://arxiv.org/abs/1705.07109

https://twitter.com/ccnlab/status/866548346751725568 (animation, over time more dimensions from this latent space (afaik PCA components) are added)

Given the limitations of fMRI (we can not do single cell recordings in human brains) the results are not as accurate, but to my knowledge this is the best we can do in humans so far.

folli9y ago

How are these macaques able to so finely differentiate faces of a different species? I'm pretty sure I wouldn't be able to differentiate many macaque faces from each other.

linux26479y ago

I wonder if this is related to why we recognize faces in nonliving objects such as cars.

j / k navigate · click thread line to collapse