undefined | Better HN

0 pointshackinthebochs1y ago0 comments

>As soon as you ask it novel questions it falls apart.

What do you mean by novel? Almost all sentences it is prompted on are brand new and it mostly responds sensibly. Surely there's some generalization going on.

0 comments

chongli1y ago

Novel as in requiring novel reasoning to sort out. One of the classic ways to expose the issue is to take a common puzzle and introduce irrelevant details and perhaps trivialize the solution. LLMs pattern match on the general form of the puzzle and then wander down the garden path to an incorrect solution that no human would fall for.

The sort of generalization these things can do seems to mostly be the trivial sort: substitution.

hackinthebochsOP1y ago

Why is your criteria for "on the path towards AGI" so absolutist? For it to be on the path towards AGI and not simply AGI it has to be deficient in some way. Why does the current failure modes tell you its on the wrong path? Yes, it has some interesting failure modes. The failure mode you mention is in fact very similar to human failure modes. We very much are prone to substituting the expected pattern when presented with a 99% match to a pattern previously seen. They also have a lot of inhuman failure modes as well. But so what, they aren't human. Their training regimes are very dissimilar to ours and so we should expect some alien failure modes owing to this. This doesn't strike me as good reason to think they're not on the path towards AGI.

Yes, LLMs aren't very good at reasoning and have weird failure modes. But why is this evidence that its on the wrong path, and not that it just needs more development that builds on prior successes?

1 more reply

moffkalast1y ago

Well the problem with that approach is that LLMs are still both incredibly dumb and small, at least compared to the what, 700T params of a human brain? Can't compare the two directly, especially when one has a massive recall advantage that skews the perception of that. But there is still some inteligence under there that's not just memorization. Not much, but some.

So if you present a novel problem it would need to be extremely simple, not something that you couldn't solve when drunk and half awake. Completely novel, but extremely simple. I think that's testable.

chongli1y ago

It’s not fair to ask me to judge them based on their size. I’m judging them based on the claims of their vendors.

Anyway the novel problems I’m talking about are extremely simple. Basically they’re variations on the “farmer, 3 animals, and a rowboat” problem. People keep finding trivial modifications to the problem that fool the LLMs but wouldn’t fool a child. Then the vendors come along and patch the model to deal with them. This is what I mean by whack-a-mole.

Searle’s Chinese Room thought experiment tells us that enough games of whack-a-mole could eventually get us to a pretty good facsimile of reasoning without ever achieving the genuine article.

1 more reply

j / k navigate · click thread line to collapse

0 comments

chongli1y ago

The sort of generalization these things can do seems to mostly be the trivial sort: substitution.

hackinthebochsOP1y ago

Yes, LLMs aren't very good at reasoning and have weird failure modes. But why is this evidence that its on the wrong path, and not that it just needs more development that builds on prior successes?

1 more reply

moffkalast1y ago

chongli1y ago

It’s not fair to ask me to judge them based on their size. I’m judging them based on the claims of their vendors.

Searle’s Chinese Room thought experiment tells us that enough games of whack-a-mole could eventually get us to a pretty good facsimile of reasoning without ever achieving the genuine article.

1 more reply

j / k navigate · click thread line to collapse