undefined | Better HN

0 pointsbumby2y ago0 comments

I don't want to speak for the OP, but one of the issues they may be poking at is the transference of intelligence. E.g., you are "trained" to play a game, but at some point someone decides to change the rules of the game. Human children can transfer their previous knowledge to the new rule set fairly well. The most intelligent children can do so very quickly without supplemental "training" on the game played under the new rules.

Humans certainly have flaws when it comes to this. I've heard some discussions about the success and failures of AI in this regard. Can someone in this domain elaborate on the current state-of-the-art performance in this regard?

0 comments

6 comments · 2 top-level

famouswaffles2y ago· 2 in thread

1. LLMs are general pattern machines. They can generalize and complete in context to a wide variety of complex non linguistic patterns.https://general-pattern-machines.github.io/

2. LLMs see positive transfer in multi lingual capability. For example, an LLM trained on 500B tokens of English and 10B tokens of French will not speak in french like a model trained on only 10B tokens of French. What will happen is that the model will be nearly as competent in French as it is in English https://arxiv.org/abs/2108.13349

3. Language models of code reason better even if the benchmarks have nothing to do with code.

https://arxiv.org/abs/2210.07128

bumbyOP2y ago

This is interesting in the context of the other response that links to poor performance in terms of counterfactuals. I wonder if it is related to how well one domain maps to another? E.g., can they transfer to english to french well because both share a similar word classes (nouns, verbs, etc.). But I believe other languages change more based on social context (e.g., Japanese) than English does. Would a LLM transfer just as well to the latter? In that case, my guess is humans would find it more difficult to transfer as well, so it may not be a good measure.

(Apologies to any linguists. Please correct anything above if I'm off).

famouswaffles2y ago

I was just using French as an example. Korean, Japanese all transfer very well. As well ? Not sure about that.

As for the other post, degraded performances are highly non trivial still. Some aren't actually poor, just worse.

Even the authors admit humans would see degraded performance on counterfactuals unless given "enough time to reason and revise", something they don't try to do with GPT-4.

Think about it. Do you genuinely belief you would score as accurately on a multiplication arithmetic test taken in base 8 ?

1 more reply

skepticATX2y ago· 2 in thread

Exactly. Current LLMs fall over when facing counterfactuals: https://arxiv.org/abs/2307.02477.

This is why it's mostly meaningless to for a LLM to pass the bar, but not meaningless for a human to do so. We (rightly, for the most part) assume that a human who passes the bar can transfer those skills into unique and novel situations. We can't make that assumption for LLMs, because they are lacking adaptability that is needed for true intelligence.

famouswaffles2y ago

That doesn't show that they "fall over". All degraded performances are highly non trivial. And even the paper admits humans would see degraded performance on counterfactuals as well. They think humans may not only with "enough time to reason and revise", something the LLMs being evaluated don't get to do here.

If you took arithmetic tests in base 8, you wouldn't reach the same accuracy either.

skepticATX2y ago

Well, sure, but the problem is that LLMs can’t reason and revise, architecturally. Perhaps we can chain together a system that approximates this, but it still wouldn’t be the LLM doing the reasoning itself.

j / k navigate · click thread line to collapse

0 comments

6 comments · 2 top-level

famouswaffles2y ago· 2 in thread

1. LLMs are general pattern machines. They can generalize and complete in context to a wide variety of complex non linguistic patterns.https://general-pattern-machines.github.io/

3. Language models of code reason better even if the benchmarks have nothing to do with code.

https://arxiv.org/abs/2210.07128

bumbyOP2y ago

(Apologies to any linguists. Please correct anything above if I'm off).

famouswaffles2y ago

I was just using French as an example. Korean, Japanese all transfer very well. As well ? Not sure about that.

As for the other post, degraded performances are highly non trivial still. Some aren't actually poor, just worse.

Even the authors admit humans would see degraded performance on counterfactuals unless given "enough time to reason and revise", something they don't try to do with GPT-4.

Think about it. Do you genuinely belief you would score as accurately on a multiplication arithmetic test taken in base 8 ?

1 more reply

skepticATX2y ago· 2 in thread

Exactly. Current LLMs fall over when facing counterfactuals: https://arxiv.org/abs/2307.02477.

famouswaffles2y ago

If you took arithmetic tests in base 8, you wouldn't reach the same accuracy either.

skepticATX2y ago

j / k navigate · click thread line to collapse