LLMs understand nullability (opens in new tab)

(dmodel.ai)

170 pointsmattmarcus1y ago134 comments

134 comments

66 comments · 19 top-level

lsy1y ago· 14 in thread

The article puts scare quotes around "understand" etc. to try to head off critiques around the lack of precision or scientific language, but I think this is a really good example of where casual use of these terms can get pretty misleading.

Because code LLMs have been trained on the syntactic form of the program and not its execution, it's not correct — even if the correlation between variable annotations and requested completions was perfect (which it's not) — to say that the model "understands nullability", because nullability means that under execution the variable in question can become null, which is not a state that it's possible for a model trained only on a million programs' syntax to "understand". You could get the same result if e.g. "Optional" means that the variable becomes poisonous and checking "> 0" is eating it, and "!= None" is an antidote. Human programmers can understand nullability because they've hopefully run programs and understand the semantics of making something null.

The paper could use precise, scientific language (e.g. "the presence of nullable annotation tokens correlates to activation of vectors corresponding to, and emission of, null-check tokens with high precision and accuracy") which would help us understand what we can rely on the LLM to do and what we can't. But it seems like there is some subconscious incentive to muddy how people see these models in the hopes that we start ascribing things to them that they aren't capable of.

waldrews1y ago

I was going to say "so you believe the LLM's don't have the capacity to understand" but then I realized that the precise language would be something like "the presence of photons in this human's retinas in patterns encoding statements about LLM's having understanding correlates to the activation of neuron signaling chains corresponding to, and emission of, muscle activations engaging keyboard switches, which produce patterns of 'no they don't' with high frequency."

The critiques of mental state applied to the LLM's are increasingly applicable to us biologicals, and that's the philosophical abyss we're staring down.

mjburgess1y ago

No it's not. He gave you modal conditions on "understanding", he said: predicting the syntax of valid programs, and their operational semantics, ie., the behaviour of the computer as it runs.

I would go much further than this; but this is a de minimus criteria that the LLM already fails.

What zealots eventually discover is that they can hold their "fanatical proposition" fixed in the face of all opposition to the contrary, by tearing down the whole edifice of science, knowledge, and reality itself.

If you wish to assert, against any reasonable thought, that the sky is a pink dome you can do so -- first that our eyes are broken, and then, eventually that we live in some paranoid "philosophical abyss" carefully constructed to permit your paranoia.

This abursidty is exhausting, and I'd wish one day to find fanatics who'd realise it quickly and abate it -- but alas, I have never.

If you find yourself hollowing-out the meaning of words to the point of making no distinctions, denying reality to reality itself, and otherwise arriving at a "philosophical abyss" be aware that it is your cherished propositions which are the maddness and nothing else.

Here: no, the LLM does not understand. Yes, we do. It is your job to begin from reasonable premises and abduce reasonable theories. If you do not, you will not.

1 more reply

shafyy1y ago

Countering the argument that LLMs are just gloriefied probability machines and do not undertand or think with "how do you know humans are not the same" has been the biggest achievement of AI hypemen (and yes, it's mostly men).

Of course, now you can say "how do you know that our brains are not just efficient computers that run LLMs", but I feel like the onus of proof lies on the makers of this claim, not on the other side.

It is very likely that human intelligence is not just autocomplete on crack, given all we know about neuroscience so far.

2 more replies

xigency1y ago

This only applies to people who understand how computers and computer programs work, because someone who doesn't externalize their thinking process would never ascribe human elements of consciousness to inanimate materials.

Certainly many ancient people worshiped celestial objects or crafted idols by their own hands and ascribed to them powers greater than themselves. That doesn't really help in the long run compared to taking personal responsibility for one's own actions and motives, the best interests of their tribe or community, and taking initiative to understand the underlying cause of mysterious phenomena.

1 more reply

uh_uh1y ago

We don't really have a clue what they are and aren't capable of. Prior to the LLM-boom, many people – and I include myself in this – thought it'd be impossible to get to the level of capability we have now purely from statistical methods and here we are. If you have a strong theory that proves some bounds on LLM-capability, then please put it forward. In the absence of that, your sceptical attitude is just as sus as the article's.

Baeocystin1y ago

I majored in CogSci at UCSD in the 90's. I've been interested and active in the machine learning world for decades. The LLM boom took me completely and utterly by surprise, continues to do so, and frankly I am most mystified by the folks who downplay it. These giant matrixes are already so far beyond what we thought was (relatively) easily achievable that even if progress stopped tomorrow, we'd have years of work to put in to understand how we got here. Doesn't mean we've hit AGI, but what we already have is truly remarkable.

2 more replies

kubav0271y ago

LLM also have no idea what it is capable of. This feels like difference to humans. Having some understanding of the problem also means knowing or "feeling" the limits of that understanding.

2 more replies

wvenable1y ago

> Because code LLMs have been trained on the syntactic form of the program and not its execution

One of the very first tests I did of ChatGPT way back when it was new was give it a relatively complex string manipulation function from our code base, strip all identifying materials from the code (variable names, the function name itself, etc), and then provide it with inputs and ask it for the outputs. I was surprised that it could correctly generate the output from the input.

So it does have some idea of what the code actually does not just syntax.

creatonez1y ago

> Because code LLMs have been trained on the syntactic form of the program and not its execution

What makes you think this? It has been trained on plenty of logs and traces, discussion of the behavior of various code, REPL sessions, etc. Code LLMs are trained on all human language and wide swaths of whatever machine-generated text is available, they are not restricted to just code.

hatthew1y ago

I am slowly coming around to the idea that nobody should ever use the word "understand" in relation to LLMs, simply because everyone has their own definition of "understand", and many of these definitions disagree, and people tend to treat their definition as axiomatic. I have yet to see any productive discussion happen once anyone disagrees on the definition of "understand".

So, what word would you propose we use to mean "an LLM's ability (or lack thereof) to output generally correct sentences about the topic at hand"?

nomonnai1y ago

It's a prediction of what humans have frequently produced in similar situations.

yujzgzc1y ago

How do you know that these models haven't been trained by running programs?

At least, it's likely that they've been trained on undergrad textbooks that explain program behaviors and contain exercises.

globnomulous1y ago

This is essentially John Searle's Chinese Room Argument against strong AI. His conclusion is broader -- he argues categorically against the very possibility of so-called "strong" AI, viz. AI that understands, not just against the narrower notion that LLMs "understand" -- but the reasoning is essentially identical.

Here's the Stanford Encyclopedia of Philosophy's superb write up: https://plato.stanford.edu/entries/chinese-room/ It covers, engagingly and with playful wit, not just Searle's original argument but its evolution in his writing, other philosopher's responses/criticisms, and Searle's counter-responses.

aoeusnth11y ago

As far as you know, AI labs are doing E2E RL training with running code in the loop to advance the model's capability to act as an agent (for cursor et al).

EncomLab1y ago· 12 in thread

This is like claiming a photorestor controlled night light "understands when it is dark" or that a bimetallic strip thermostat "understands temperature". You can say those words, and it's syntactically correct but entirely incorrect semantically.

aSanchezStern1y ago

The post includes this caveat. Depending on your philosophical position about sentience you might say that LLMs can't possibly "understand" anything, and the post isn't trying to have that argument. But to the extent that an LLM can "understand" anything, you can study its understanding of nullability.

keybored1y ago

People don’t use “understand” for machines in science because people may or may not believe in the sentience of machines. That would be a weird catering to panpsychism.

nsingh21y ago

Where is the boundary where this becomes semantically correct? It's easy for these kinds of discussions to go in circles, because nothing is well defined.

nativeit1y ago

Hard to define something that science has yet to formally outline, and is largely still in the realm of religion.

2 more replies

fallingknife1y ago

Or like saying the photoreceptors in your retina understand when it's dark. Or like claiming the temperature sensitive ion channels in your peripheral nervous system understand how hot it is.

throw48472851y ago

This is a fallacy I've seen enough on here that I think it needs a name. Maybe the fallacy of Theoretical Reducibility (doesn't really roll off the tongue)?

When challenged, everybody becomes an eliminative materialist even if it's inconsistent with their other views. It's very weird.

throwuxiytayq1y ago

Or like saying that the tangled web of neurons receiving signals from these understands anything about these subjects.

nativeit1y ago

Describing the mechanics of nervous impulses != describing consciousness.

1 more reply

robotresearcher1y ago

You declare this very plainly without evidence or argument, but this is an age-old controversial issue. It’s not self-evident to everyone, including philosophers.

fwip1y ago

Philosophers are often the last people to consider something to be settled. There's very little in the universe that they can all agree is true.

mubou1y ago

It's not age-old nor is it controversial. LLMs aren't intelligent by any stretch of the imagination. Each word/token is chosen as that which is statistically most likely to follow the previous. There is no capability for understanding in the design of an LLM. It's not a matter of opinion; this just isn't how an LLM works.

Any comparison to the human brain is missing the point that an LLM only simulates one small part, and that's notably not the frontal lobe. That's required for intelligence, reasoning, self-awareness, etc.

So, no, it's not a question of philosophy. For an AI to enter that realm, it would need to be more than just an LLM with some bells and whistles; an LLM plus something else, perhaps, something fundamentally different which does not yet currently exist.

4 more replies

dleeftink1y ago

I'd say the opposite also applies: to the extent LLMs have an internal language, we understand very little of it.

gopiandcode1y ago· 4 in thread

The visualisation of how the model sees nullability was fascinating.

I'm curious if this probing of nullability could be composed with other LLM/ML-based python-typing tools to improve their accuracy.

Maybe even focusing on interfaces such as nullability rather than precise types would work better with a duck-typed language like python than inferring types directly (i.e we don't really care if a variable is an int specifically, but rather that it supports _add or _sub etc. that it is numeric).

jayd161y ago

Why not just use a language with checked nullability? What's the point of an LLM using a duck typing language anyway?

aSanchezStern1y ago

This post actually mostly uses the subset of Python where nullability is checked. The point is not to introduce new LLM capabilities, but to understand more about how existing LLMs are reasoning about code.

qsort1y ago

> we don't really care if a variable is an int specifically, but rather that it supports _add or _sub etc. that it is numeric

my brother in christ, you invented Typescript.

(I agree on the visualization, it's very cool!)

gopiandcode1y ago

I am more than aware of Typescript, you seem to have misunderstood my point: I was not describing a particular type system (of which there have been many of this ilk) but rather conjecturing that targeting interfaces specifically might make LLM-based code generation/type inference more effective.

1 more reply

nativeit1y ago· 4 in thread

We’re all just elementary particles being clumped together in energy gradients, therefore my little computer project is sentient—this is getting absurd.

famouswaffles1y ago

Well you can say it doesn't understand, but then you don't have a very useful definition of the word.

You can say this is not 'real' understanding but you like many others will be unable to clearly distinguish this 'fake' understanding from 'real' understanding in a verifiable fashion, so you are just playing a game of meaningless semantics.

You really should think about what kind of difference is supposedly so important yet will not manifest itself in any testable way - an invented one.

nativeit1y ago

Sorry, this is more about the discussion of this article than the article itself. The moving goal posts that acolytes use to declare consciousness are becoming increasingly cult-y.

drodgers1y ago

Who cares about consciousness? This is just a mis-direction of the discussion. Ditto for 'intelligence' and 'understanding'.

Let's talk about what they can do and where that's trending.

wongarsu1y ago

We spent 40 years moving the goal posts on what constitutes AI. Now we seem to have found an AI worthy of that title and instead start moving the goal posts on "consciousness", "understanding" and "intelligence".

7 more replies

sega_sai1y ago· 3 in thread

One thing that is exciting in the text is an attempt to go away from describing whether LLM 'understands' which I would argue an ill posed question, but instead rephrase it in terms of something that can actually be measured.

It would be good to list a few possible ways of interpreting 'understanding of code'. It could possibly include: 1) Type inference for the result 2) nullability 3) runtime asymptotics 4) What the code does

kazinator1y ago

5) predicting a bunch of language tokens from the compressed database of knowledge encoded as weights, calculated out of numerous examples that exploit nullability in code and talk about it in accompanying text.

empath751y ago

Is there any way you can tell whether a human understands something other than by asking them a question and judging their answer?

Nobody interrogates each other's internal states when judging whether someone understands a topic. All we can judge it based on are the words they produce or the actions they take in response to a situation.

The way that systems or people arrive at a response is sort of an implementation detail that isn't that important when judging whether a system does or doesn't understand something. Some people understand a topic on an intuitive, almost unthinking level, and other people need to carefully reason about it, but they both demonstrate understanding by how they respond to questions about it in the exact same way.

cess111y ago

No, most people absolutely use non-linguistic, involuntary cues when judging the responses of other people.

To not do that is commonly associated with things like being on the spectrum or cognitive deficiencies.

3 more replies

stared1y ago· 2 in thread

Once LLMs fully understand nullability, they will cease to use that.

Tony Hoare called it "a billion-dollar mistake" (https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retra...), Rust had made core design choices precisely to avoid this mistake.

In practical AI-assisted coding in TypeScript I have found that it is good to add in Cursor Rules to avoid anything nullable, unless it is a well-designed choice. In my experience, it makes code much better.

hombre_fatal1y ago

I don’t get the problem with null values as long as you can statically reason about them which wasn’t even the case in Java where you had to always do runtime null-guards before access.

But in Typescript, who cares? You’d be forced to handle null the same way you’d be forced to handle Maybe<T> = None | Just<T> except with extra, unidiomatic ceremony in the latter case.

ngruhn1y ago

What you mean with unidiomatic? If a language has

    Maybe<T> = None | Just<T>

as a core concept then it's idiomatic by definition.

1 more reply

nonameiguess1y ago· 2 in thread

As every fifth thread becomes some discussion of LLM capabilities, I think we need to shift the way we talk about this to be less like how we talk about software and more like how we talk about people.

"LLM" is a valid category of thing in the world, but it's not a thing like Microsoft Outlook that has well-defined capabilities and limitations. It's frustrating reading these discussions that constantly devolve into one person saying they tried something that either worked or didn't, then 40 replies from other people saying they got the opposite result, possibly with a different model, different version, slight prompt altering, whatever it is.

LLMs possibly have the capability to understand nullability, but that doesn't mean every instance of every model will consistently understand that or anything else. This is the same way humans operate. Humans can run a 4-minute mile. Humans can run a 10-second 100 meter dash. Humans can develop and prove novel math theorems. But not all humans, not all the time, performance depends upon conditions, timing, luck, and there has probably never been a single human who can do all three. It takes practice in one specific discipline to get really good at that, and this practice competes with or even limits other abilities. For LLMs, this manifests in differences with the way they get fine-tuned and respond to specific prompt sequences that should all be different ways of expressing the same command or query but nonetheless produce different results. This is very different from the way we are used to machines and software behaving.

aSanchezStern1y ago

Yeah the link title is overclaiming a bit, the actual post title doesn't make such a general claim, and the post itself examines several specific models and compares their understanding.

root_axis1y ago

Encouraging the continued anthropomorphization of these models is a bad idea, especially in the context of discussing their capabilities.

btown1y ago· 1 in thread

There seems to be a typo in OP's "Visualizing Our Results" - but things make perfect sense if red is non-nullable, green is nullable.

I'd be really curious to see where the "attention" heads of the LLM look when evaluating the nullability of any given token. Does just it trust the Optional[int] return type signature of the function, or does it also skim through the function contents to understand whether that's correct?

It's fascinating to me to think that the senior developer skillset of being able to skim through complicated code, mentally make note of different tokens of interest where assumptions may need to be double-checked, and unravel that cascade of assumptions to track down a bug, is something that LLMs already excel at.

Sure, nullability is an example where static type checkers do well, and it makes the article a bit silly on its own... but there are all sorts of assumptions that aren't captured well by type systems. There's been a ton of focus on LLMs for code generation; I think that LLMs for debugging makes for a fascinating frontier.

aSanchezStern1y ago

Thanks for pointing that out, it's fixed now.

kazinator1y ago· 1 in thread

LLMs "understand" nullability to the extent that texts they have been trained on contain examples of nullability being used in code, together with remarks about it in natural language. When the right tokens occur in your query, other tokens get filled in from that data in a clever way. That's all there is to it.

The LLM will not understand, and is incapable of developing an understanding, of a concept not present in its training data.

If try to teach it the basics of the misunderstood concept in your chat, it will reflect back a verbal acknowledgement, restated in different words, with some smoothly worded embellishments which looks like the external trappings of understanding. It's only a mirage though.

The LLM will code anything, no matter how novel, if you give it detailed enough instructions and clarifications. That's just a a language translation task from pseudo-code to code. Being a language model, it's designed for that.

LLM is like the bar waiter who has picked up on economics and politics talk, and is able to interject with something clever sounding, to the surprise of the patrons. Gee, how does he or she understand the workings of the international monetary fund, and what the hell are they doing working in this bar?

ghc1y ago

Great analogy at the end! I'm going to have to steal this, because it hits right at the heart of the problem with relying on LLMs to do things outside of what they were designed for.

plaineyjaney1y ago· 1 in thread

This is really interesting! Intuitively it's hard to grasp that you can just subtract two average states and get a direction describing the model's perception of nullability.

nick__m1y ago

The original word2vec example might be easier to understand:

  vec(King) - vec(Man) + vec(Woman) = vec(Queen)

amelius1y ago· 1 in thread

I'm curious what happens if you run the LLM with variable names that occur often with nullable variables, but then use them with code that has a non-nullable variable.

aSanchezStern1y ago

The answer it seems is, it depends on what kind of code you're looking at. The post showed that `for` loops cause a lot more variable-name-biased reasoning, while `ifs` and function defs/calls are more variable-name independent.

kmod1y ago· 1 in thread

I found this overly handwavy, but I discovered that there is a non-"gentle" version of this page which is more explicit:

https://dmodel.ai/nullability/

aSanchezStern1y ago

Yeah that's linked a couple of times in the post

tanvach1y ago· 1 in thread

Dear future authors: please run multiple iterations and report the probability.

From: ‘Keep training it, though, and eventually it will learn to insert the None test’

To: ‘Keep training it, though, and eventually the probability of inserting the None test goes up to xx%’

The former is just horse poop, we all know LLMs generate big variance in output.

aSanchezStern1y ago

If you're interested in a more scientific treatment of the topic, the post links to a technical report which reports the numbers in detail. This post is instead an attempt to explain the topics to a more general audience, so digging into the weeds isn't very useful.

gwern1y ago

> Interestingly, for models up to 1 billion parameters, the loss actually starts to increase again after reaching a minimum. This might be because as training continues, the model develops more complex, non-linear representations that our simple linear probe can’t capture as well. Or it might be that the model starts to overfit on the training data and loses its more general concept of nullability.

Double descent?

apples_oranges1y ago

Sounds like the process to update/jailbreak llms in a way that they don’t deny requests and always answer. There is also this direction of denial. (Article about it: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...)

Would be fun if they also „cancelled the nullability direction“.. the llms probably would start hallucinating new explanations for what is happening in the code.

timewizard1y ago

"Validate a phone number."

The code is entirely wrong. That validates something that's close to a NAPN number but isn't actually a NAPN number. In particular the area code cannot start with 0 nor can the central office code. There are several numbers, like 911, which have special meaning, and cannot appear in either position.

You'd get better results if you went to Stack Overflow and stole the correct answer yourself. Would probably be faster too.

This is why "non technical code writing" is a terrible idea. The underlying concept is explicitly technical. What are we even doing?

ashoeafoot1y ago

nstate programming is an antipattern use railway orientated programming instead.

thmorriss1y ago

very cool.

casenmgreen1y ago

LLMs do understand nothing.

They are not reasoning.

j / k navigate · click thread line to collapse

134 comments

66 comments · 19 top-level

lsy1y ago· 14 in thread

waldrews1y ago

The critiques of mental state applied to the LLM's are increasingly applicable to us biologicals, and that's the philosophical abyss we're staring down.

mjburgess1y ago

No it's not. He gave you modal conditions on "understanding", he said: predicting the syntax of valid programs, and their operational semantics, ie., the behaviour of the computer as it runs.

I would go much further than this; but this is a de minimus criteria that the LLM already fails.

This abursidty is exhausting, and I'd wish one day to find fanatics who'd realise it quickly and abate it -- but alas, I have never.

Here: no, the LLM does not understand. Yes, we do. It is your job to begin from reasonable premises and abduce reasonable theories. If you do not, you will not.

1 more reply

shafyy1y ago

Of course, now you can say "how do you know that our brains are not just efficient computers that run LLMs", but I feel like the onus of proof lies on the makers of this claim, not on the other side.

It is very likely that human intelligence is not just autocomplete on crack, given all we know about neuroscience so far.

2 more replies

xigency1y ago

1 more reply

uh_uh1y ago

Baeocystin1y ago

2 more replies

kubav0271y ago

LLM also have no idea what it is capable of. This feels like difference to humans. Having some understanding of the problem also means knowing or "feeling" the limits of that understanding.

2 more replies

wvenable1y ago

> Because code LLMs have been trained on the syntactic form of the program and not its execution

So it does have some idea of what the code actually does not just syntax.

creatonez1y ago

> Because code LLMs have been trained on the syntactic form of the program and not its execution

hatthew1y ago

So, what word would you propose we use to mean "an LLM's ability (or lack thereof) to output generally correct sentences about the topic at hand"?

nomonnai1y ago

It's a prediction of what humans have frequently produced in similar situations.

yujzgzc1y ago

How do you know that these models haven't been trained by running programs?

At least, it's likely that they've been trained on undergrad textbooks that explain program behaviors and contain exercises.

globnomulous1y ago

aoeusnth11y ago

As far as you know, AI labs are doing E2E RL training with running code in the loop to advance the model's capability to act as an agent (for cursor et al).

EncomLab1y ago· 12 in thread

aSanchezStern1y ago

keybored1y ago

People don’t use “understand” for machines in science because people may or may not believe in the sentience of machines. That would be a weird catering to panpsychism.

nsingh21y ago

Where is the boundary where this becomes semantically correct? It's easy for these kinds of discussions to go in circles, because nothing is well defined.

nativeit1y ago

Hard to define something that science has yet to formally outline, and is largely still in the realm of religion.

2 more replies

fallingknife1y ago

Or like saying the photoreceptors in your retina understand when it's dark. Or like claiming the temperature sensitive ion channels in your peripheral nervous system understand how hot it is.

throw48472851y ago

This is a fallacy I've seen enough on here that I think it needs a name. Maybe the fallacy of Theoretical Reducibility (doesn't really roll off the tongue)?

When challenged, everybody becomes an eliminative materialist even if it's inconsistent with their other views. It's very weird.

throwuxiytayq1y ago

Or like saying that the tangled web of neurons receiving signals from these understands anything about these subjects.

nativeit1y ago

Describing the mechanics of nervous impulses != describing consciousness.

1 more reply

robotresearcher1y ago

You declare this very plainly without evidence or argument, but this is an age-old controversial issue. It’s not self-evident to everyone, including philosophers.

fwip1y ago

Philosophers are often the last people to consider something to be settled. There's very little in the universe that they can all agree is true.

mubou1y ago

4 more replies

dleeftink1y ago

I'd say the opposite also applies: to the extent LLMs have an internal language, we understand very little of it.

gopiandcode1y ago· 4 in thread

The visualisation of how the model sees nullability was fascinating.

I'm curious if this probing of nullability could be composed with other LLM/ML-based python-typing tools to improve their accuracy.

jayd161y ago

Why not just use a language with checked nullability? What's the point of an LLM using a duck typing language anyway?

aSanchezStern1y ago

qsort1y ago

> we don't really care if a variable is an int specifically, but rather that it supports _add or _sub etc. that it is numeric

my brother in christ, you invented Typescript.

(I agree on the visualization, it's very cool!)

gopiandcode1y ago

1 more reply

nativeit1y ago· 4 in thread

We’re all just elementary particles being clumped together in energy gradients, therefore my little computer project is sentient—this is getting absurd.

famouswaffles1y ago

Well you can say it doesn't understand, but then you don't have a very useful definition of the word.

You really should think about what kind of difference is supposedly so important yet will not manifest itself in any testable way - an invented one.

nativeit1y ago

Sorry, this is more about the discussion of this article than the article itself. The moving goal posts that acolytes use to declare consciousness are becoming increasingly cult-y.

drodgers1y ago

Who cares about consciousness? This is just a mis-direction of the discussion. Ditto for 'intelligence' and 'understanding'.

Let's talk about what they can do and where that's trending.

wongarsu1y ago

7 more replies

sega_sai1y ago· 3 in thread

kazinator1y ago

empath751y ago

Is there any way you can tell whether a human understands something other than by asking them a question and judging their answer?

cess111y ago

No, most people absolutely use non-linguistic, involuntary cues when judging the responses of other people.

To not do that is commonly associated with things like being on the spectrum or cognitive deficiencies.

3 more replies

stared1y ago· 2 in thread

Once LLMs fully understand nullability, they will cease to use that.

Tony Hoare called it "a billion-dollar mistake" (https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retra...), Rust had made core design choices precisely to avoid this mistake.

hombre_fatal1y ago

I don’t get the problem with null values as long as you can statically reason about them which wasn’t even the case in Java where you had to always do runtime null-guards before access.

But in Typescript, who cares? You’d be forced to handle null the same way you’d be forced to handle Maybe<T> = None | Just<T> except with extra, unidiomatic ceremony in the latter case.

ngruhn1y ago

What you mean with unidiomatic? If a language has

    Maybe<T> = None | Just<T>

as a core concept then it's idiomatic by definition.

1 more reply

nonameiguess1y ago· 2 in thread

aSanchezStern1y ago

Yeah the link title is overclaiming a bit, the actual post title doesn't make such a general claim, and the post itself examines several specific models and compares their understanding.

root_axis1y ago

Encouraging the continued anthropomorphization of these models is a bad idea, especially in the context of discussing their capabilities.

btown1y ago· 1 in thread

There seems to be a typo in OP's "Visualizing Our Results" - but things make perfect sense if red is non-nullable, green is nullable.

aSanchezStern1y ago

Thanks for pointing that out, it's fixed now.

kazinator1y ago· 1 in thread

The LLM will not understand, and is incapable of developing an understanding, of a concept not present in its training data.

ghc1y ago

Great analogy at the end! I'm going to have to steal this, because it hits right at the heart of the problem with relying on LLMs to do things outside of what they were designed for.

plaineyjaney1y ago· 1 in thread

This is really interesting! Intuitively it's hard to grasp that you can just subtract two average states and get a direction describing the model's perception of nullability.

nick__m1y ago

The original word2vec example might be easier to understand:

  vec(King) - vec(Man) + vec(Woman) = vec(Queen)

amelius1y ago· 1 in thread

I'm curious what happens if you run the LLM with variable names that occur often with nullable variables, but then use them with code that has a non-nullable variable.

aSanchezStern1y ago

kmod1y ago· 1 in thread

I found this overly handwavy, but I discovered that there is a non-"gentle" version of this page which is more explicit:

https://dmodel.ai/nullability/

aSanchezStern1y ago

Yeah that's linked a couple of times in the post

tanvach1y ago· 1 in thread

Dear future authors: please run multiple iterations and report the probability.

From: ‘Keep training it, though, and eventually it will learn to insert the None test’

To: ‘Keep training it, though, and eventually the probability of inserting the None test goes up to xx%’

The former is just horse poop, we all know LLMs generate big variance in output.

aSanchezStern1y ago

gwern1y ago

Double descent?

apples_oranges1y ago

Would be fun if they also „cancelled the nullability direction“.. the llms probably would start hallucinating new explanations for what is happening in the code.

timewizard1y ago

"Validate a phone number."

You'd get better results if you went to Stack Overflow and stole the correct answer yourself. Would probably be faster too.

This is why "non technical code writing" is a terrible idea. The underlying concept is explicitly technical. What are we even doing?

ashoeafoot1y ago

nstate programming is an antipattern use railway orientated programming instead.

thmorriss1y ago

very cool.

casenmgreen1y ago

LLMs do understand nothing.

They are not reasoning.

j / k navigate · click thread line to collapse