No need. Just add one more correction to the system prompt.
It's amusing to see hardcore believers of this tech doing mental gymnastics and attacking people whenever evidence of there being no intelligence in these tools is brought forth. Then the tool is "just" a statistical model, and clearly the user is holding it wrong, doesn't understand how it works, etc.
And why should a "superintelligent" tool need to be optimized for riddles to begin with? Do humans need to be trained on specific riddles to answer them correctly?
If you don't recognise the problem and actively engage your "system 2 brain", it's very easy to just leap to the obvious (but wrong) answer. That doesn't mean you're not intelligent and can't work it out if someone points out the problem. It's just the heuristics you've been trained to adopt betray you here, and that's really not so different a problem to what's tricking these llms.
It may trigger a particularly ambiguous path in the model's token weights, or whatever the technical explanation for this behavior is, which can certainly be addressed in future versions, but what it does is expose the fact that there's no real intelligence here. For all its "thinking" and "reasoning", the tool is incapable of arriving at the logically correct answer, unless it was specifically trained for that scenario, or happens to arrive at it by chance. This is not how intelligence works in living beings. Humans don't need to be trained at specific cognitive tasks in order to perform well at them, and our performance is not random.
But I'm sure this is "moving the goalposts", right?
"A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?"
And yet 50% of MIT students fall for this sort of thing[1]. They're not unintelligent, it's just a specific problem can make your brain fail in weird specific ways. Intelligence isn't just a scale from 0-100, or some binary yes or no question, it's a bunch of different things. LLMs probably are less intelligent on a bunch of scales, but this one specific example doesn't tell you much that they have weird quirks just like we do.
[1] https://www.aeaweb.org/articles?id=10.1257/08953300577519673...