These models don't understand anything similar to reality and they can be confused by all sorts of things.
This can obviously be managed and people have achieved great things with them, including this IMO stuff, but the models are despite their capability very, very far from AGI. They've also got atrocious performance on things like IQ tests.