What boggles the mind: we have gone for so long to try to strive for correctness and suddenly being right 70% of the time and wrong the remaining 30% is fine. The parallel with self driving is pretty strong here: solving 70% of the cases is easy, the remaining 30% are hard or maybe even impossible. Statistically speaking these models do better than most humans, most of the time. But they do not do better than all humans, and they can't do it all of the time and when they get it wrong they make such tremendously basic mistakes that you have to wonder how they manage to get things right.
Maybe it's true that with an ever increasing model size and more and more (proprietary, the public sources are exhausted by now so private data is the frontier where model owners can still gain an edge) we will reach a point where the models will be right 98% of the time or more but what would be the killer feature for me is an indication of the confidence level of the output. Because no matter whether junk or pearls it all looks the same and that is more dangerous than having nothing at all.