undefined | Better HN

0 pointsadamzwasserman6mo ago0 comments

The confound concern is fair: no cross-linguistic comparison is perfectly controlled. The bet is that the effect size (if any) will be large enough to be informative despite the noise. But you're right that it's not ceteris paribus in a strict sense.

Your proposal is interesting though. Synthetic manipulation of morphology within a single language. Have you seen this done? The challenge I'd anticipate is that "genderized English" wouldn't have natural text to train on, so you'd need to generate it somehow, which introduces its own artifacts. But comparing French vs artificially gender-neutralized French might be feasible with existing parallel corpora. Worth thinking about as a follow-up.

On the neural network → brain distance: agreed it's a leap. The claim isn't that transformers are brains, but that if both are extracting structure from language, they might reveal something about what structure is there to extract. Fedorenko's own comparison to "early LLMs" suggests she thinks the analogy has some merit.

0 comments

2 comments · 1 top-level

tgv6mo ago· 1 in thread

> The bet is that the effect size (if any) will be large enough to be informative despite the noise.

But you have no grounds to ascribe it to the posited difference. Finding no effect might yield more information, but that's hard: given the amount of noise, you're bound to find a great many effects.

> Have you seen this done?

Not in LLMs, but there have been experiments with regularizing languages, and getting people to learn them in Second Language Acquisition (L2) studies. But what I've seen is inconclusive and sometimes outright contradictory.

I think people have also looked via information theory at this. Probably using Markov models.

> Fedorenko's own comparison to "early LLMs" suggests she thinks the analogy has some merit.

I don't think she can seriously entertain that thought. We simply know practically nothing about language processes in the brain. What we know about the hardware is very different from LLMs, early or not.

Just to give an indication of how much we don't know: the Stroop effect (https://en.wikipedia.org/wiki/Stroop_effect) is almost 100 years old. We have no idea what causes it. There's no working model of word recognition. There are only vague suggestions about the origin of the delay. We have no clue how the visual signals for the color and the letters are separated, where they join again, and how that's related to linguistic knowledge. And that's almost 100 years of very, very much research. IF you go to Google Scholar and type "Stroop task", you'll get 197.000 (!) hits. That's nearly 200k articles etc. resulting in no knowledge whatsoever about a very simple, artificial task.

adamzwassermanOP6mo ago

On effect size: my primary goal at this stage is falsification. If French and English models show no meaningful differences at matched compute, that's informative: it would support the scaling hypothesis. If they do differ, I'll need to be careful about causal claims, but it would at least challenge the "transformers are magic" framing that treats architecture as the main story.

The L2 regularization and information theory pointers are helpful, it will go on my reading list. If you have favorites, I'll start there.

On the "we know nothing" point: I'm sympathetic. The Stroop example is exactly why I'm skeptical of strong claims in either direction. 197k papers and no mechanism suggests language processing has properties we don't yet have frameworks to describe. That's not mysticism. It's just acknowledging the gap between phenomenon and explanation.

j / k navigate · click thread line to collapse