Regardless, my claim was not to argue that LLMs are more capable than people. My point was that I think there is a bit of a selection bias going on. Perhaps conjecture on my part, but I am inclined to believe that people are more keen to notice and make a big fuss over inaccuracies in LLMs, but are less likely to do so when humans are inaccurate.
Think about the everyday world we live in: how many human programmed bugs make it past reviews, tests, QA, and into production? How many doctors give the wrong diagnosis or make a mistake that harms or kills someone? How many lawyers give poor legal advice to clients?
Fallible humans expecting infallible results from their fallible creations is quite the expectation.
We built tools to accomplish things we cannot do well or at all. So we do expect quite a lot from them, even though we know they're not perfect. We have writings and books to help our memory and knowledge transfer. We have cars and planes to transport us faster than legs ever could... Any apparatus that doesn't help us do something better is aptly called a toy. A toy car can be faster than any human, but it's still a toy.
This seems like a reasonable standard to hold GPT-5 to given the way it’s being marketed. Nobody would care if OpenAI compared it to an enthusiastic high school student with a few hours to poke around Google and come up with an answer.
Do you think there could be a depth vs. breadth difference? Perhaps that PhD aerospace engineer would know more in this one particular area but less across an array of areas of aerospace engineering.
I cannot give an answer for your question. I was mainly trying to point out that we humans are highly fallible too. I would imagine no one with a PhD in any modern field knows everything about their field nor are they immune to mistakes.
Was this misconception truly basic? I admittedly somewhat skimmed those parts of the debate because I am not knowledgeable enough to know who is right/wrong. It was clear that, if indeed it was a basic concept, there is quite some contention still.
> This seems like a reasonable standard to hold GPT-5 to given the way it’s being marketed.
Sure, I suppose I can agree with this.