Then the "Literature result" columns have a citations for where similar published results were found. The ones with no "Literature" column, like in the first section, are cases where no similar published results have been found (implying that the solution would not have been trained on). Note that in some cases a published solution was found but it wasn't similar to the AI's.
(this is all explained with more detail and caveats at the top of the page)
FWIW I've wavered on this topic quite a bit. Not too long ago I leaned more heavily towards "complex cognitive capabilities can be expressed using statistical token generation", I've started leaning the other way, but I'm not committed so it's great to circle back on the state of things.
FWIW, personally I think it muddies things to frame the question as if "..using statistical token generation" was a limitation. NNs are Turing-complete, so what LLMs do can just be considered "computation" - the fact that they compute via statistical token generation is an implementation detail.
And if you're like most people, "can cognition happen via computation?" is a less controversial question, which then puts LLMs/cognition topics easily into the "in principle, obviously, but we can debate whether it's achievable or how to measure it" category.