I know it falls down on other stuff. To me, it seems to operate in a world of a kind of dream-logic, where it does hallucinate things, but other times it is clearly reasoning in a pretty deep way.
This "Sparks of AGI" talk by Sebastien Bubeck covers a lot of amazing examples. (I'd provide a link but I'm not on a good connection and YouTube isn't loading for me)
Just because an AI makes mistakes that a human doesn't make, doesn't mean it's not smarter than a human. It's an alien. The things that it finds important and fixates on are not the same things we do, and there are a lot of things that most humans would get wrong that GPT-4 gets right. Sure, it's not great at understanding physical objects, but it's been trained on pure text, not by living in a physical world and playing with toys and having adults put names to shapes and movements. We have an intuition through living every second of our lives interacting with the physical world. I'd expect this sort of AI to find simple problems with physical objects hard in the same way that a human finds pure mathematics hard.
Not to mention that we've often accidentally trained it to give confident, plausible sounding answers, instead of saying "I don't know". An AGI is not necessarily going to look and sound like a really smart human.