The problem with these bromides is not that they're wrong, it's that they're not even wrong. They're predictive nulls.
What observable differences can we expect between an entity with True Understanding and an entity without True Understanding? It's a theological question, not a scientific one.
I'm not an AI booster by any means, but I do strongly prefer we address the question of AI agent intelligence scientifically rather than theologically.
It's the same mechanism behind artisanal food, artist struggles, and luxury goods. It is the metaphysical properties we attach to objects or the frames we use to interpret strips of events. We author all of these and then promptly forget we've done so, instead believing they are simply reality.
It's the "it's just a stochastic parrot!" camp that's doing the theological work. (and maybe also those in the Singularity camp...)
That said, I do think there's value in having people understand what "Understanding" means, which is kinda a theological (philosophical :D) question. IMHO, in every-day language there's a functional part (that can be tested with benchmarks), and there's a subjective part (i.e. what does it feel like to understand something?). Most people without the appropriate training simply mix up these two things, and together with whatever insecurities they have with AI taking over the world (which IMHO is inevitable to some extent), they just express their strong opinions about it online...
We never had a big demand to define how humans are intelligent or conscious etc, since it is too hard and was relegated to a some frontier researchers. And with LLMs we now do have such demand but the science wasn't ready. So we are all collectively searching in the dark, trying to define if we are different from these programs if not how. I certainly can't do that. I do know that LLMs are useful, but I also suspect that AI (aka AGI nowadays) is not yet reached.
- clear generalizability
- insane growth rates (go back and look at where we were maybe 2 years ago and then consider the already signed compute infrastructure deals coming online)
And still say with a straight face that this is some kind of parlor trick or monkeys with typewriters.
we don’t need to run LLMs for years. The point is look at where we are today and consider performance gets 10x cheaper every year.
LLMs and agentic systems are clearly not monkeys with typewriters regurgitating training data. And they have and continue to grow in capabilities at extremely fast rates.
But for super hard tasks, there is no situation when you just dump a few papers for context add a prompt and LLM will spit out correct answer. It's likely that a lead on such project would need to additionally train LLM on their local dataset, then parse through a lot of experimental data, then likely run multiple LLMs for for many iterations homing on the solution, verifying intermediate results, then repeating cycle again and again. And in parallel the same would do other team members. All in all, for such a huge hard task a year of cumulative machine-hours is not something outlandish.
Alternative perspective: the science may not have been ready, so instead we brute-forced the problem, through training of LLMs. Consider what the overall goal function of LLM training is: it's predicting tokens that continue given input in a way that makes sense to humans - in fully general meaning of this statement.
It's a single training process that gives LLMs the ability to parse plain language - even if riddled with 1337-5p34k, typos, grammar errors, or mixing languages - and extract information from it, or act on it; it's the same single process that makes it equally good at writing code and poetry, at finding bugs in programs, inconsistencies in data, corruptions in images, possibly all at once. It's what makes LLMs good at lying and spotting lies, even if input is a tree of numbers.
(It's also why "hallucinations" and "prompt injection" are not bugs, but fundamental facets of what makes LLMs useful. They cannot and will not be "fixed", any more than you can "fix" humans to be immune to confabulation and manipulation. It's just the nature of fully general sytems.)
All of that, and more, is encoded in this simple goal function: if a human looks at the output, will they say it's okay or nonsense? We just took that and thrown a ton of compute at it.
This is spot on and one of the reasons why I don't think putting LLMs or LLM based devices into anything that requires security is a good idea.