If you asked me what I expected, since this one has "thinking", it'd be that it would've thought to do something like generate the image without Waldo first, then insert Waldo somewhere into that image as an "edit"
Yep. “Where’s Waldo” has been a classic challenge for generative models for a while because it requires understanding the entire concept (there’s only one Waldo), while also holding up to scrutiny when you examine any individual, ordinary figure.
I experimented with the concept of procedural generation of Waldo-style scavenger images with Flux models with rather disappointing results. (unsurprisingly).