Analyzing whether or not LLMs have intelligence is missing the forest from the trees. This technology is emerging in a capitalist society that is hyper optimized to adopt useful things at the expense of almost everything else. If the utility/price point gets hit for a problem, it will replace it regardless of if it is intelligent or not.
If a language model can't solve problems in a programming language then we are just fooling ourselves in less defined domains of "thought".
Software engineering is where the rubber meets the road in terms of intelligence and economics when viewing our society as a complex system. Software engineering salaries are above average exactly because most average people are not going to be software engineers.
From that point of view the progress is not impressive at all. The current models are really not that much better than chatGPT4 in April 2023.
AI art is a better example though. There is zero progress being made now. It is only impressive at the most surface level for someone not involved in art and who can't see how incredibly limited the AI art models are. We have already moved on to video though to make the same half baked, useless models that are only good to make marketing videos for press releases about progress and one off social media posts about how much progress is being made.
For example, a team of humans are extremely reliable, much more reliable than one human, but a team of AI's isn't mean reliable than one AI since an AI is already an ensemble model. That means even if an AI could replace a person, it probably can't replace a team for a long time, meaning you still need the other team members there, meaning the AI didn't really replace a human it just became a tool for huamns to use.
I personally wouldn't be surprised if we start to see benchmarks around this type of cooperation and ability to orchestrate complex systems in the next few years or so.
Most benchmarks really focus on one problem, not on multiple real-time problems while orchestrating 3rd party actors who might or might not be able to succeed at certain tasks.
But I don't think anything is prohibiting these models from not being able to do that.