You misunderstand the situation.
If I give ChatGPT-4 the original farmer riddle, it "solves" it just fine, but it's assumed that it isn't actually solving it. That is, it's not thinking or doing any logical reasoning, or anything resembling that to come to a solution to the problem, but that it's simply regurgitating the problem's solution since it appears in the training data.
Giving ChatGPT-4 the modified farmers riddle, and having it spit out the incorrect, multi-step solution, is then proof that the LLM isn't doing anything that can be considered reasoning, but that it's merely repeating what's assumed to be in its training data.
ChatGPT-o1-preview correctly managing to actually parse my modified riddle, and then not simply parroting out the answer from the training corpus but give the right solution, as if it read it carefully, then says something about the improved logical and deductive reasoning capabilities of the newer model.