So, can I go ahead and abstract out the ability of mind being discussed? Basically, given a category, this vision processing module in brain is processing different features of the image(here feature in the machine learning sense). And these categories can be hierarchical. Like faces, humans, creatures, this can be a hierarchy that the brain may be referring to when it is trying to identify a face and switches to the mode where it needs a holistic image view rather than some isolated parts of brain. I understand that imagining how this happens biologically(physiologically) is hard for me.
My question is, am I correct in the above inference? I want to suggest an experiment now :D :P