> I don't think we see a cat and our brain have it frame by frame adjust our synaptic weights (or whatever brains do)
I think that "whatever we do" is doing a lot of heavy lifting here. Some of those "whatevers" will be isomorphic to a frame-level analysis that pulls out structural commonalities, or close enough that it's not a clunky reductionist analogy.
When we see what we think is a cat, what we have categorised as a cat, I don't think we are looking at it from each angle and going, cat, cat, cat.
I think there is an aspect of something like the 'free-energy principle' that is required to trigger off a re-assessment. So while visually we may receive 20fps of cat images, it's mostly discarded unless there is some novelty that challenges expectation.