undefined | Better HN

0 pointsben_w1y ago0 comments

While I'd agree human failures are different from AI failures, human failures are necessarily also nonsensical. Familiar, human, but nonsensical — consider how often a human disagreeing with another will use the phrase "that's just common sense!"

I think the larger models are consuming in the order of 100k as much as we do, and while they have a much broader range of knowledge, it's not 100k as much breadth.

0 comments

5 comments · 1 top-level

steveBK1231y ago· 4 in thread

Well it's a breadth & depth problem isn't it?

Humans are nonsensical, but in somewhat predictable error rates by domain, per individual. So you hire people with the skillsets, domain expertise, and error rates you need.

With an LLM, it's nonsensical in a completely random way prompt to prompt. It's sort of like talking into a telephone and sometimes Einstein is on the other end, and sometimes it's a drunken child. You have no idea when you pick up the phone which way its going to go.

We feed these things nearly the entirety of human knowledge, and the output still feels rather random.

LLMs have all that information and then still have a ~10% chance of messing up simple mathematical comparison that an average 12 year old would not.

Other times we delegate much more complex tasks to LLMs and they work great!

But given the nondeterminism it becomes hard to delegate tasks you can't check the work of, if it is important.

ben_wOP1y ago

Weirdly, I find myself agreeing with your vibes despite disagreeing on — oh, half? — the specifics.

I'm not sure what to make of that, but thought you might find it as curious as I do.

steveBK1231y ago

Thanks I think. My general vibe is - we are still confounding "wow this is neat" & "this is really useful" in this space. LLMs will lead to some real use cases & products.

I think we are seeing spaghetti against the wall / LLMs as a hammer right now. A lot of what we are seeing thrown out there is a misapplication of LLMs that will slowly fade away.. It is likely other techniques / models are required for some of the applications people are throwing LLMs at.

Feels reminiscent of "blockchain".

1 more reply

phreeza1y ago

I haven't worked with LLMs enough to know this but I wonder: are they nonsensical in a truly random way or are they just nonsensical on a different axis in task space than normal humans, and we perhaps just haven't fully internalized what that axis is?

steveBK1231y ago

I'm not really sure, and you can pull lots of funny examples where various models have progress & regressions dealing with such mundane simple math.

As recently as August "11.10 or 11.9 which is bigger" came up with the wrong answer on ChatGPT and was followed with lots of wrong justification for the wrong answer. Even follow up math question "what is 11.10 - 11.9" gave me the answer "11.10 - 11.9 equals 0.2"

We can quibble about what model I was using, or what edge case I hit, or how quick they fixed it.. but this is 2 years into the very public LLM hype wave so at some point I expect better.

It gives me pause in asking more complex math questions I cannot immediately verify results, in which case, again why would I pay for a tool to ask questions I already know the answer to?

1 more reply

j / k navigate · click thread line to collapse

0 comments

5 comments · 1 top-level

steveBK1231y ago· 4 in thread

Well it's a breadth & depth problem isn't it?

Humans are nonsensical, but in somewhat predictable error rates by domain, per individual. So you hire people with the skillsets, domain expertise, and error rates you need.

We feed these things nearly the entirety of human knowledge, and the output still feels rather random.

LLMs have all that information and then still have a ~10% chance of messing up simple mathematical comparison that an average 12 year old would not.

Other times we delegate much more complex tasks to LLMs and they work great!

But given the nondeterminism it becomes hard to delegate tasks you can't check the work of, if it is important.

ben_wOP1y ago

Weirdly, I find myself agreeing with your vibes despite disagreeing on — oh, half? — the specifics.

I'm not sure what to make of that, but thought you might find it as curious as I do.

steveBK1231y ago

Thanks I think. My general vibe is - we are still confounding "wow this is neat" & "this is really useful" in this space. LLMs will lead to some real use cases & products.

Feels reminiscent of "blockchain".

1 more reply

phreeza1y ago

steveBK1231y ago

I'm not really sure, and you can pull lots of funny examples where various models have progress & regressions dealing with such mundane simple math.

We can quibble about what model I was using, or what edge case I hit, or how quick they fixed it.. but this is 2 years into the very public LLM hype wave so at some point I expect better.

It gives me pause in asking more complex math questions I cannot immediately verify results, in which case, again why would I pay for a tool to ask questions I already know the answer to?

1 more reply

j / k navigate · click thread line to collapse