undefined | Better HN

0 pointsD-Machine4mo ago0 comments

This comment and GP comment are why the word "causal model" is needed. LLMs are predictive* models of human language, but they are not causal models of language.

If you believe that some of human cognition is linguistic (even if e.g. inner monologue and spoken language are just the surface of deeper more unconscious processes), then, yes, we might say LLMs can predictively model some aspects of human cognition, but, again, they are certainly not causal models, and they are not predictive models of human cognition generally (as cognition is clearly far, far more than linguistic).

* I avoid calling LLMs "statistical" because they really aren't even that. They are not calibrated, and including a softmax and log-loss in things doesn't magically make your model statistical (especially since ad-hoc regularization methods, other loss functions and simplex mappings, e.g. sparsemax, often work better and then violate the assumptions that are needed to prove these things are behaving statistically). LLMs really are more accurately just doing (very, very fancy and impressive) curve/manifold-fitting.

0 comments

foldr4mo ago

They are not predictive models in the domains Chomsky investigated. LLMs make no predictions about, say, when non-surface quantifier scope should or should not be possible, or what should or shouldn’t be a wh-island. They are predictive in a sense that’s largely irrelevant to cognitive science. (Trying to guess what words might come after some other words isn’t a problem in cognitive science.)

tripletao4mo ago

"What should or shouldn’t be a wh-island" is literally a statement of "what words might come after some other words"! An LLM encodes billions of such statements, just unfortunately in a quantity and form that makes them incomprehensible to an unaided human. That part is strictly worse; but the LLM's statements model language well enough to generate it, and that part is strictly better.

As I read Norvig's essay, it's about that tradeoff, of whether a simple and comprehensible but inaccurate model shows more promise than a model that's incomprehensible except in statistical terms with the aid of a computer, but far more accurate. I understand there's a large group of people who think Norvig is wrong or incoherent; but when those people have no accomplishments except within the framework they themselves have constructed, what am I supposed to think?

Beyond that, if I have a model that tells me whether a sentence is valid, then I can always try different words until I find one that makes it valid. Any sufficiently good model is thus capable of generation. Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.

As to the relationship between signals from biological neurons and ANN activations, I mean something like the paper linked below, whose authors write:

> Thus, even though the goal of contemporary AI is to improve model performance and not necessarily to build models of brain processing, this endeavor appears to be rapidly converging on architectures that might capture key aspects of language processing in the human mind and brain.

https://www.biorxiv.org/content/10.1101/2020.06.26.174482v3....

I emphasize again that I believe these results have been oversold in the popular press, but the idea that an ANN trained on brain output (including written language) might provide insight into the physical, causal structure of the brain is pretty mainstream now.

foldr4mo ago

> What should or shouldn’t be a wh-island" is literally a statement of "what words might come after some other words"!

This gets at the nub of the misunderstanding. Chomsky is interested in modeling the range of grammatical structures and associated interpretations possible in natural languages. The wh-island condition is a universal structural constraint that only indirectly (and only sometimes) has implications for which sequences of words are ‘valid’ in a particular language.

LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.

If you want a more concrete example of why wh-islands can’t be understood in terms of permissible or impermissible sequences of words, consider cases like

How often did you ask why John took out the trash?

The wh-island created by ‘why’ removes one of the in-principle possible interpretations (the embedded question reading where ‘how often’ associates with ‘took’), but the sequence of words is fine.

> Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.

No, Chomsky really was working on a different task: a solution to the logical problem of language acquisition and a theory of the range of possible grammatical variation across human languages. There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text. From a cognitive point of view, text generation rather obviously involves the contribution of many non-linguistic cognitive systems which are not modeled (nor intended to be modeled) by a generative grammar.

>the paper linked below

This paper doesn’t make any claims that are obviously incompatible with anything that Chomsky has said. The fundamental finding is unsurprising: brains are sensitive to surprisal. The better your language model is at modeling whether or not a sequence of words is likely, the better you can predict the brain’s surprisal reactions. There are no implications for cognitive architecture. This ought to be clear from that fact that a number of different neural net architectures are able to achieve a good degree of success, according to the paper’s own lights.

tripletao4mo ago

> LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.

The human-designed architecture of an LLM makes no such prediction; but after training, the overall system including the learned weights absolutely does, or else it couldn't generate valid language. If you'd prefer to run in the opposite direction, then you can feed in sentences with correct and incorrect wh-movement, and you'll find the incorrect ones are much less probable.

That prediction is commingled with billions of other predictions, which collectively model natural language better than any machine ever constructed before. It seems like you're discounting it because it wasn't made by and can't be understood by an unaided human; but it's not like the physicists at the LHC are analyzing with paper and pencil, right?

> There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text.

Imagine that claim in human form--I'm an expert in the structure of the Japanese language, but I'm unable to hold a basic conversation. Would you not feel some doubt? So why aren't you doubting the model here? Of course it would have been outlandish to expect that of a model five years ago, but it isn't today.

I see your statement that Chomsky isn't attempting to model the "many non-linguistic cognitive systems", but those don't seem to cause the LLM any trouble. The statistical modelers have solved problem after problem that was previously considered impossible, and the practical applications of that are (for better or mostly worse) reshaping major aspects of society. Meanwhile, every conversation I've had with a Chomsky supporter seems to reduce to "he is deliberately choosing not to produce any result evaluable by a person who hasn't spent years studying his theories". I guess that's true, but that mostly just makes me regret what time I've already spent.

1 more reply

D-MachineOP4mo ago

Correct, LLMs are predictive also only in a narrow sense!

j / k navigate · click thread line to collapse

0 comments

foldr4mo ago

tripletao4mo ago

As to the relationship between signals from biological neurons and ANN activations, I mean something like the paper linked below, whose authors write:

https://www.biorxiv.org/content/10.1101/2020.06.26.174482v3....

foldr4mo ago

> What should or shouldn’t be a wh-island" is literally a statement of "what words might come after some other words"!

LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.

If you want a more concrete example of why wh-islands can’t be understood in terms of permissible or impermissible sequences of words, consider cases like

How often did you ask why John took out the trash?

> Chomsky never proposed anything capable of that; but that just means his models were bad, not that he was working on a different task.

>the paper linked below

tripletao4mo ago

> LLMs make no prediction at all as to whether or not natural languages should have wh-islands: they’ll happily learn languages with or without such constraints.

> There is no reason to think that a perfect theory in this domain would be of any particular help in generating plausible-looking text.

1 more reply

D-MachineOP4mo ago

Correct, LLMs are predictive also only in a narrow sense!

j / k navigate · click thread line to collapse