Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.
Meanwhile AI can't be trusted to give me a recipe for potato soup. That is to say, I would under no circumstances blindly follow the output of an LLM I asked to make soup. While I have, every day of my life, gladly sent all of the compiler output to the CPU without ever checking it.
The compiler metaphor is simply incorrect and people trying to say LLMs compile English into code insult compiler devs and English speakers alike.
In my experience this isn't true. People just assume their code is wrong and mess with it until they inadvertently do something that works around the bug. I've personally reported 17 bugs in GCC over the last 2 years and there are currently 1241 open wrong-code bugs.
Here's an example of a simple to understand bug (not mine) in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180
LLMS on the other hand are non-deterministic and unpredictable and fuzzy by design. That makes them not ideal when trying to produce output which is provably correct - sure you can output and then laboriously check the output - some people find that useful, some are yet to find it useful.
It's a little like using Bitcoin to replace currencies - sure you can do that, but it includes design flaws which make it fundamentally unsuited to doing so. 10 years ago we had rabid defenders of these currencies telling us they would soon take over the global monetary system and replace it, nowadays, not so much.
At least, Bitcoin transactions are deterministic.
Not many would want to use a AI currency (mostly works; always shows "Oh, you are 100% right" after losing one's money).
You are an extreme outlier. I know about two dozen people who work with C(++) and not a single one of them has ever told me that they've found a compiler bug when we've talked about coding and debugging - it's been exclusively them describing PEBCAK.
Today, I use gcc and clang. I would say that compiler bugs are not common in released versions of those (i.e. not alpha or beta), but they do still occur. Although I will say I don't recall the last time I came across a code generation bug.
Yes, it is possible for a compiler to have a bug. No, that is I’m mo way analogous to AI producing buggy code.
I’ve experienced maybe two compiler bugs in my twenty year career. I have experienced countless AI mistakes - hundreds? Thousands? Already.
These are not the same and it has the whiff of sales patter trying to address objections. Please stop.
This even applies to human written code and human mistakes, as the expected cost of errors goes up we spend more time on having multiple people review the code and we worry more about carefully designing tests.
It's a great tool, once it matures.
Or the argument that "well, at some point we can come up with a prompt language that does exactly what you want and you just give it a detailed spec." A detailed spec is called code. It's the most round-about way to make a programming language that even then is still not deterministic at best.
Probabilistic generation will be weighted towards the means in the training data. Do I want my code looking like most code most of the time in a world full of Node.js and PHP? Am I better served by rapid delivery from a non-learning algorithm that requires eternal vigilance and critical re-evaluation or with slower delivery with a single review filtered through an meatspace actor who will build out trustable modules in a linear fashion with known failure modes already addressed by process (ie TDD, specs, integration & acceptance tests)?
I’m using LLMs a lot, but can’t shake the feeling that the TCO and total time shakes out worse than it feels as you go.
>Am I better served
For anything serious, I write the code "semi-interactively", i.e. I just prompt and verify small chunks of the program in rapid succession. That way I keep my mental model synced the whole time, I never have any catching up to do, and honestly it just feels good to stay in the driver's seat.
I don't even see how an LLM (or frankly any recipe) that is a summary / condensation of various recipes can ever be good, because cooking isn't something where you can semantically condense or even mathematically combine various recipes together to get one good one. It just doesn't work like that, there is just one secret recipe that produces the best dish, and the way to find this secret recipe is by experimenting in the real world, not by trying to find some weighting of a bunch of different steps from a bunch of different recipes.
Plus, LLMs don't know how to judge quality of recipes at all (and indeed hallucinate total nonsense if they don't have search enabled).
Compilers aren't even that bad. The stack goes much deeper and during your career you may be (un)lucky enough to find yourself far below compilers: https://bostik.iki.fi/aivoituksia/random/developer-debugging...
NB. I've been to vfs/fs depths. A coworker relied on an oscilloscope quite frequently.
I've also used a logic analyzer to debug communications protocols quite a few times in my career, and I've grown to rather like that sort of work, tedious as it may be.
Just this week I built a VFS using FUSE and managed to kernel panic my Mac a half-dozen times. Very fun debugging times.
I remember the time I spent hours debugging a feature that worked on Solaris and Windows but failed to produce the right results on SGI. Turns out the SGI C++ compiler silently ignored the `throw` keyword! Just didn’t emit an opcode at all! Or maybe it wrote a NOP.
All I’m saying is, compilers aren’t perfect.
I agree about determinism though. And I mitigate that concern by prompting AI assistants to write code that solves a problem, instead of just asking for a new and potentially different answer every time I execute the app.
Or what tone of voice in prompt you gave them. Or if Saturn is in Aries or Sagittarius.
This just isn't true any more. Outside of work, my most common use case for LLMs is probably cooking. I used to frequently second guess them, but no longer - in my experience SOTA models are totally reliable for producing good recipes.
I recognize that at a higher level we're still talking about probabilistic recipe generation vs. deterministic compiler output, but at this point it's nonetheless just inaccurate to act as though LLMs can't be trusted with simple (e.g. potato soup recipe) tasks.
It's not apples vs. oranges. They are literally opposite of each other.
If an LLM was analogous to a compiler, then we would be committing prompts to source control, not the output of the LLM (the "machine code").
Because there isn’t a canonical recipe for potato soup.
How long did it take for horses to be super-seeded by cars?
How long did powertool take to become the norm for tradesmen?
This has gone unbelievably fast.
First compilers were created in the fifties. I doubt those were bug-free.
Give LLMs some fifty or so years, then let's see how (un)reliable they are.
And if you start feeding an unambiguous, formal language to an LLM, couldn't you just write a compiler for that language instead of having the LLM interpret it?
Compilers are deterministic (modulo bugs), but most things in life are not, but can still be reliable.
The opposite also holds: "npm install && npm run build" can work today and fail in a year (due to ecosystem churn) even though every single component in that chain is deterministic.
2) Reliability is a continuum, not a discreet yes/no. In practice, we want things to be reliable enough (where "enough" is determined per domain).
I don't presume this will immediately change your mind, but hopefully will open your eyes to looking at this a bit differently.