undefined | Better HN

0 pointsjacquesm4mo ago0 comments

Unlike the general public though, these models have advanced dementia when it comes to learning from corrections, even within a single session. They keep regressing and I haven't found a way to stop that yet.

What boggles the mind: we have gone for so long to try to strive for correctness and suddenly being right 70% of the time and wrong the remaining 30% is fine. The parallel with self driving is pretty strong here: solving 70% of the cases is easy, the remaining 30% are hard or maybe even impossible. Statistically speaking these models do better than most humans, most of the time. But they do not do better than all humans, and they can't do it all of the time and when they get it wrong they make such tremendously basic mistakes that you have to wonder how they manage to get things right.

Maybe it's true that with an ever increasing model size and more and more (proprietary, the public sources are exhausted by now so private data is the frontier where model owners can still gain an edge) we will reach a point where the models will be right 98% of the time or more but what would be the killer feature for me is an indication of the confidence level of the output. Because no matter whether junk or pearls it all looks the same and that is more dangerous than having nothing at all.

0 comments

IgorPartola4mo ago

A common resistor has a +/- 10% tolerance. A milspec one is 1%. Yet we have ways of building robust systems using such “subpar” components. The trick is to structure the system in a way that builds the error rate into the process and corrects for it. Easier said than done of course for a lot of problems but we do have techniques for doing this and we are learning more.

fluoridation4mo ago

I think the real killer feature would be that they stop making basic mistakes, and that they gain some introspection. It's not a problem if they're wrong 30% of the time if they're able to gauge their own confidence like a human would. Then you can know to disregard the answer, or check it more thoroughly.

xpe4mo ago

> It's not a problem if they're wrong 30% of the time if they're able to gauge their own confidence like a human would.

This is a case where I would not use human performance as the standard to beat. Training people to be both intellectually honest and statistically calibrated is really hard.

fluoridation4mo ago

Perhaps, but an AI that can only answer like a precocious child who's spent years reading encyclopedias but has not learned to detect when it's thinking poorly or not remembering clearly is much less useful.

fragmede4mo ago

> the killer feature for me is an indication of the confidence level of the output.

I don't think I did something special too ChatGPT to get it to do this, but it's started reporting confidence levels to me, eg from my most recent chat:

> In China: you could find BEVs that cost same or even less than ICE equivalents in that size band. (Confidence ~0.70)

j / k navigate · click thread line to collapse

0 pointsjacquesm4mo ago0 comments

0 comments

IgorPartola4mo ago

fluoridation4mo ago

xpe4mo ago

> It's not a problem if they're wrong 30% of the time if they're able to gauge their own confidence like a human would.

This is a case where I would not use human performance as the standard to beat. Training people to be both intellectually honest and statistically calibrated is really hard.

fluoridation4mo ago

fragmede4mo ago

> the killer feature for me is an indication of the confidence level of the output.

I don't think I did something special too ChatGPT to get it to do this, but it's started reporting confidence levels to me, eg from my most recent chat:

> In China: you could find BEVs that cost same or even less than ICE equivalents in that size band. (Confidence ~0.70)

j / k navigate · click thread line to collapse