undefined | Better HN

0 pointsatomicnumber31mo ago0 comments

"When was the last time you reviewed the machine code produced by a compiler?"

Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.

Meanwhile AI can't be trusted to give me a recipe for potato soup. That is to say, I would under no circumstances blindly follow the output of an LLM I asked to make soup. While I have, every day of my life, gladly sent all of the compiler output to the CPU without ever checking it.

The compiler metaphor is simply incorrect and people trying to say LLMs compile English into code insult compiler devs and English speakers alike.

0 comments

LiamPowell1mo ago

> Compilers will produce working output given working input literally 100% of my time in my career.

In my experience this isn't true. People just assume their code is wrong and mess with it until they inadvertently do something that works around the bug. I've personally reported 17 bugs in GCC over the last 2 years and there are currently 1241 open wrong-code bugs.

Here's an example of a simple to understand bug (not mine) in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180

grey-area1mo ago

These are still deterministic bugs, which is the point the OP was making. They can be found and solved once. Most of those bugs are simply not that important, so they never get attention.

LLMS on the other hand are non-deterministic and unpredictable and fuzzy by design. That makes them not ideal when trying to produce output which is provably correct - sure you can output and then laboriously check the output - some people find that useful, some are yet to find it useful.

It's a little like using Bitcoin to replace currencies - sure you can do that, but it includes design flaws which make it fundamentally unsuited to doing so. 10 years ago we had rabid defenders of these currencies telling us they would soon take over the global monetary system and replace it, nowadays, not so much.

zx80801mo ago

> It's a little like using Bitcoin to replace currencies [...]

At least, Bitcoin transactions are deterministic.

Not many would want to use a AI currency (mostly works; always shows "Oh, you are 100% right" after losing one's money).

2 more replies

throw109201mo ago

> I've personally reported 17 bugs in GCC over the last 2 years

You are an extreme outlier. I know about two dozen people who work with C(++) and not a single one of them has ever told me that they've found a compiler bug when we've talked about coding and debugging - it's been exclusively them describing PEBCAK.

usefulcat1mo ago

I've been using c++ for over 30 years. 20-30 years ago I was mostly using MSVC (including version 6), and it absolutely had bugs, sometimes in handling the language spec correctly and sometimes regarding code generation.

Today, I use gcc and clang. I would say that compiler bugs are not common in released versions of those (i.e. not alpha or beta), but they do still occur. Although I will say I don't recall the last time I came across a code generation bug.

arvyy1mo ago

I knew one person reporting gcc bugs, and iirc those were all niche scenarios where it generated slightly suboptimal machine code but not otherwise observable from behavior

1 more reply

rhubarbtree1mo ago

This argument is disingenuous and distracts rather than addresses the point.

Yes, it is possible for a compiler to have a bug. No, that is I’m mo way analogous to AI producing buggy code.

I’ve experienced maybe two compiler bugs in my twenty year career. I have experienced countless AI mistakes - hundreds? Thousands? Already.

These are not the same and it has the whiff of sales patter trying to address objections. Please stop.

LiamPowell1mo ago

I'm not arguing that LLMs are at a point today where we can blindly trust their outputs in most applications, I just don't think that 100% correct output is necessarily a requirement for that. What it needs to be is correct often enough that the cost of reviewing the output far outweighs the average cost of any errors in the output, just like with a compiler.

This even applies to human written code and human mistakes, as the expected cost of errors goes up we spend more time on having multiple people review the code and we worry more about carefully designing tests.

2 more replies

dbtablesorrows1mo ago

the fact that the bug tracker exists is proving GP's point.

eklavya1mo ago

Right, now what would you say is the probability of getting a bug in compiler output vs ai output?

It's a great tool, once it matures.

rootnod31mo ago

Absolutely this. I am tired of that trope.

Or the argument that "well, at some point we can come up with a prompt language that does exactly what you want and you just give it a detailed spec." A detailed spec is called code. It's the most round-about way to make a programming language that even then is still not deterministic at best.

wtetzner1mo ago

And at the point that your detailed specification language is deterministic, why do you need AI in the middle?

rootnod31mo ago

Exactly the point. AI is absolutely BS that just gets peddled by shills. It does not work. It might work for some JS bullcrao. But take existing code and ask it to add capsicum next to an ifdef of pledge. Watch the mayhem unfold.

andai1mo ago

This is obviously besides the point but I did blindly follow a wiener schnitzel recipe ChatGPT made me and cooked for a whole crew. It turned out great. I think I got lucky though, the next day I absolutely massacred the pancakes.

D-Machine1mo ago

I genuinely admire your courage and willingness (or perhaps just chaos energy) to attempt both wiener schnitzel and pancakes for a crew, based on AI recipes, despite clearly limited knowledge of either.

bonesss1mo ago

Recent experiments with LLM recipes (ChatGPT): missed salt in a recipe to make rice, then flubbed whether that type of rice was recommended to be washed in the recipe it was supposedly summarizing (and lied about it, too)…

Probabilistic generation will be weighted towards the means in the training data. Do I want my code looking like most code most of the time in a world full of Node.js and PHP? Am I better served by rapid delivery from a non-learning algorithm that requires eternal vigilance and critical re-evaluation or with slower delivery with a single review filtered through an meatspace actor who will build out trustable modules in a linear fashion with known failure modes already addressed by process (ie TDD, specs, integration & acceptance tests)?

I’m using LLMs a lot, but can’t shake the feeling that the TCO and total time shakes out worse than it feels as you go.

andai1mo ago

There was a guy a few months ago who found that telling the AI to do everything in a single PHP file actually produced significantly better results, i.e. it worked on the first try. Otherwise it defaulted to React, 1GB of node modules, and a site that wouldn't even load.

>Am I better served

For anything serious, I write the code "semi-interactively", i.e. I just prompt and verify small chunks of the program in rapid succession. That way I keep my mental model synced the whole time, I never have any catching up to do, and honestly it just feels good to stay in the driver's seat.

D-Machine1mo ago

Pro-tip: Do NOT use LLMs to generate recipes, use them to search the internet for a site with a trustworthy recipe, for information on cooking techniques, science, or chemistry, or if you need ideas about pairings and/or cooking theory / conventions. Do not trust anything an LLM says if it doesn't give a source, it seems people on the internet can't cook for shit and just make stuff up about food science and cooking (e.g. "searing seals in the moisture", though most people know this is nonsense now), so the training data here is utterly corrupt. You always need to inspect the sources.

I don't even see how an LLM (or frankly any recipe) that is a summary / condensation of various recipes can ever be good, because cooking isn't something where you can semantically condense or even mathematically combine various recipes together to get one good one. It just doesn't work like that, there is just one secret recipe that produces the best dish, and the way to find this secret recipe is by experimenting in the real world, not by trying to find some weighting of a bunch of different steps from a bunch of different recipes.

Plus, LLMs don't know how to judge quality of recipes at all (and indeed hallucinate total nonsense if they don't have search enabled).

1 more reply

bostik1mo ago

Everything more complex than a hello-world has bugs. Compiler bugs are uncommon, but not that uncommon. (I must have debugged a few ICEs in my career, but luckily have had more skilled people to rely on when code generation itself was wrong.)

Compilers aren't even that bad. The stack goes much deeper and during your career you may be (un)lucky enough to find yourself far below compilers: https://bostik.iki.fi/aivoituksia/random/developer-debugging...

NB. I've been to vfs/fs depths. A coworker relied on an oscilloscope quite frequently.

nneonneo1mo ago

I had a fun bug while building a smartwatch app that was caused by the sample rate of the accelerometer increasing when the device heated up. I had code that was performing machine learning on the accelerometer data, which would mysteriously get less accurate during prolonged operation. It turned out that we gathered most of our training data during shorter runs when the device was cool, and when the device heated up during extended use, it changed the frequencies of the recorded signals enough to throw off our model.

I've also used a logic analyzer to debug communications protocols quite a few times in my career, and I've grown to rather like that sort of work, tedious as it may be.

Just this week I built a VFS using FUSE and managed to kernel panic my Mac a half-dozen times. Very fun debugging times.

pcl1mo ago

”I've never personally found a compiler bug.”

I remember the time I spent hours debugging a feature that worked on Solaris and Windows but failed to produce the right results on SGI. Turns out the SGI C++ compiler silently ignored the `throw` keyword! Just didn’t emit an opcode at all! Or maybe it wrote a NOP.

All I’m saying is, compilers aren’t perfect.

I agree about determinism though. And I mitigate that concern by prompting AI assistants to write code that solves a problem, instead of just asking for a new and potentially different answer every time I execute the app.

Ygg21mo ago

Compilers don't change output assemby based on what markdown you provide them via .claude.

Or what tone of voice in prompt you gave them. Or if Saturn is in Aries or Sagittarius.

idopmstuff1mo ago

> Meanwhile AI can't be trusted to give me a recipe for potato soup.

This just isn't true any more. Outside of work, my most common use case for LLMs is probably cooking. I used to frequently second guess them, but no longer - in my experience SOTA models are totally reliable for producing good recipes.

I recognize that at a higher level we're still talking about probabilistic recipe generation vs. deterministic compiler output, but at this point it's nonetheless just inaccurate to act as though LLMs can't be trusted with simple (e.g. potato soup recipe) tasks.

bayindirh1mo ago

Compilers and processors are deterministic by design. LLMs are non-deterministic by design.

It's not apples vs. oranges. They are literally opposite of each other.

Scene_Cast21mo ago

Just to nitpick - compilers (and, to some extent, processors) weren't deterministic a few decades ago. Getting them to be deterministic has been a monumental effort - see build reproducibility.

anematode1mo ago

I'm trying to track down a GCC miscompilation right now ;)

keyle1mo ago

I feel for you :D

wtetzner1mo ago

> The compiler metaphor is simply incorrect

If an LLM was analogous to a compiler, then we would be committing prompts to source control, not the output of the LLM (the "machine code").

jen729w1mo ago

> Meanwhile AI can't be trusted to give me a recipe for potato soup.

Because there isn’t a canonical recipe for potato soup.

lebuin1mo ago

There's also no canonical way to write software, so in that sense generating code is more similar to coming up with a potato soup recipe than compiling code.

Jensson1mo ago

That is not the issue, any potato soup recipe would be fine, the issue is that it might fetch values from different recipes and give you an abomination.

D-Machine1mo ago

This exactly, I cook as passion, and LLMs just routinely very clearly (weighted) "average" together different recipes to produce, in the worst case, disgusting monstrosities, or, in the best case, just a near-replica of some established site's recipe.

1 more reply

keyle1mo ago

You're correct, and I believe this is only a matter of time. Over time it has been getting better and will keep doing so.

blks1mo ago

It won’t be deterministic.

wtetzner1mo ago

The input to LLMs is natural language. Natural language is ambiguous. No amount of LLM improvements will change that.

bigstrat20031mo ago

Maybe. But it's been 3 years and it still isn't good enough to actually trust. That doesn't raise confidence that it will ever get there.

keyle1mo ago

You need to put this revolution in scale with other revolutions.

How long did it take for horses to be super-seeded by cars?

How long did powertool take to become the norm for tradesmen?

This has gone unbelievably fast.

3 more replies

senko1mo ago

> Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.

First compilers were created in the fifties. I doubt those were bug-free.

Give LLMs some fifty or so years, then let's see how (un)reliable they are.

wtetzner1mo ago

What I don't understand about these arguments is that the input to the LLMs is natural language, which is inherently ambiguous. At which point, what does it even mean for an LLM to be reliable?

And if you start feeding an unambiguous, formal language to an LLM, couldn't you just write a compiler for that language instead of having the LLM interpret it?

senko1mo ago

1) Determinism isn't the same as reliability.

Compilers are deterministic (modulo bugs), but most things in life are not, but can still be reliable.

The opposite also holds: "npm install && npm run build" can work today and fail in a year (due to ecosystem churn) even though every single component in that chain is deterministic.

2) Reliability is a continuum, not a discreet yes/no. In practice, we want things to be reliable enough (where "enough" is determined per domain).

I don't presume this will immediately change your mind, but hopefully will open your eyes to looking at this a bit differently.

1 more reply

j / k navigate · click thread line to collapse

0 comments

LiamPowell1mo ago

> Compilers will produce working output given working input literally 100% of my time in my career.

Here's an example of a simple to understand bug (not mine) in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180

grey-area1mo ago

These are still deterministic bugs, which is the point the OP was making. They can be found and solved once. Most of those bugs are simply not that important, so they never get attention.

zx80801mo ago

> It's a little like using Bitcoin to replace currencies [...]

At least, Bitcoin transactions are deterministic.

Not many would want to use a AI currency (mostly works; always shows "Oh, you are 100% right" after losing one's money).

2 more replies

throw109201mo ago

> I've personally reported 17 bugs in GCC over the last 2 years

usefulcat1mo ago

arvyy1mo ago

I knew one person reporting gcc bugs, and iirc those were all niche scenarios where it generated slightly suboptimal machine code but not otherwise observable from behavior

1 more reply

rhubarbtree1mo ago

This argument is disingenuous and distracts rather than addresses the point.

Yes, it is possible for a compiler to have a bug. No, that is I’m mo way analogous to AI producing buggy code.

I’ve experienced maybe two compiler bugs in my twenty year career. I have experienced countless AI mistakes - hundreds? Thousands? Already.

These are not the same and it has the whiff of sales patter trying to address objections. Please stop.

LiamPowell1mo ago

2 more replies

dbtablesorrows1mo ago

the fact that the bug tracker exists is proving GP's point.

eklavya1mo ago

Right, now what would you say is the probability of getting a bug in compiler output vs ai output?

It's a great tool, once it matures.

rootnod31mo ago

Absolutely this. I am tired of that trope.

wtetzner1mo ago

And at the point that your detailed specification language is deterministic, why do you need AI in the middle?

rootnod31mo ago

andai1mo ago

D-Machine1mo ago

bonesss1mo ago

I’m using LLMs a lot, but can’t shake the feeling that the TCO and total time shakes out worse than it feels as you go.

andai1mo ago

>Am I better served

D-Machine1mo ago

Plus, LLMs don't know how to judge quality of recipes at all (and indeed hallucinate total nonsense if they don't have search enabled).

1 more reply

bostik1mo ago

NB. I've been to vfs/fs depths. A coworker relied on an oscilloscope quite frequently.

nneonneo1mo ago

I've also used a logic analyzer to debug communications protocols quite a few times in my career, and I've grown to rather like that sort of work, tedious as it may be.

Just this week I built a VFS using FUSE and managed to kernel panic my Mac a half-dozen times. Very fun debugging times.

pcl1mo ago

”I've never personally found a compiler bug.”

All I’m saying is, compilers aren’t perfect.

Ygg21mo ago

Compilers don't change output assemby based on what markdown you provide them via .claude.

Or what tone of voice in prompt you gave them. Or if Saturn is in Aries or Sagittarius.

idopmstuff1mo ago

> Meanwhile AI can't be trusted to give me a recipe for potato soup.

bayindirh1mo ago

Compilers and processors are deterministic by design. LLMs are non-deterministic by design.

It's not apples vs. oranges. They are literally opposite of each other.

Scene_Cast21mo ago

Just to nitpick - compilers (and, to some extent, processors) weren't deterministic a few decades ago. Getting them to be deterministic has been a monumental effort - see build reproducibility.

anematode1mo ago

I'm trying to track down a GCC miscompilation right now ;)

keyle1mo ago

I feel for you :D

wtetzner1mo ago

> The compiler metaphor is simply incorrect

If an LLM was analogous to a compiler, then we would be committing prompts to source control, not the output of the LLM (the "machine code").

jen729w1mo ago

> Meanwhile AI can't be trusted to give me a recipe for potato soup.

Because there isn’t a canonical recipe for potato soup.

lebuin1mo ago

There's also no canonical way to write software, so in that sense generating code is more similar to coming up with a potato soup recipe than compiling code.

Jensson1mo ago

That is not the issue, any potato soup recipe would be fine, the issue is that it might fetch values from different recipes and give you an abomination.

D-Machine1mo ago

1 more reply

keyle1mo ago

You're correct, and I believe this is only a matter of time. Over time it has been getting better and will keep doing so.

blks1mo ago

It won’t be deterministic.

wtetzner1mo ago

The input to LLMs is natural language. Natural language is ambiguous. No amount of LLM improvements will change that.

bigstrat20031mo ago

Maybe. But it's been 3 years and it still isn't good enough to actually trust. That doesn't raise confidence that it will ever get there.

keyle1mo ago

You need to put this revolution in scale with other revolutions.

How long did it take for horses to be super-seeded by cars?

How long did powertool take to become the norm for tradesmen?

This has gone unbelievably fast.

3 more replies

senko1mo ago

> Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.

First compilers were created in the fifties. I doubt those were bug-free.

Give LLMs some fifty or so years, then let's see how (un)reliable they are.

wtetzner1mo ago

What I don't understand about these arguments is that the input to the LLMs is natural language, which is inherently ambiguous. At which point, what does it even mean for an LLM to be reliable?

And if you start feeding an unambiguous, formal language to an LLM, couldn't you just write a compiler for that language instead of having the LLM interpret it?

senko1mo ago

1) Determinism isn't the same as reliability.

Compilers are deterministic (modulo bugs), but most things in life are not, but can still be reliable.

The opposite also holds: "npm install && npm run build" can work today and fail in a year (due to ecosystem churn) even though every single component in that chain is deterministic.

2) Reliability is a continuum, not a discreet yes/no. In practice, we want things to be reliable enough (where "enough" is determined per domain).

I don't presume this will immediately change your mind, but hopefully will open your eyes to looking at this a bit differently.

1 more reply

j / k navigate · click thread line to collapse