Perhaps you remember that language models were completely useless at coding some years ago, and now they can do quite a lot of things, even if they are not perfect. That is progress, and that does give reason to extrapolate.
Unless of course you mean something very special with "solving programming".
IMO, they're still useless today, with the only progress being that they can produce a more convincing facade of usefulness. I wouldn't call that very meaningful progress.
Clearly, statistical models trained on this HN thread would output that sequence of tokens with high probability. Are you suggesting that a statement being probable in a text corpus is not a legitimate source of truth? Can you generalize that a little bit?
But for small personal projects? Yes, helpful.
x10 of zero is still zero, I guess.
LLMs can only give you code that somebody has wrote before. This is inherent. This is useful for a bunch of stuff, but that bunch won't change if OpenAI decides to spend the GDP of Germany training one instead of Costa Rica.
This is trivial to prove to be false.
Invent a programming language that does not exist. Describe its semantics to an LLM. Ask it to write a program to solve a problem in that language. It will not always work, but it will work often enough to demonstrate that they are very much capable of writing code that has never been written before.
The first time I tried this was with GPT3.5, and I had it write code in an unholy combination of Ruby and INTERCAL, and it had no problems doing that.
Similarly giving it a grammar of a hypothetical language, and asking it to generate valid text in a language that has not existed before also works reasonably well.
This notion that LLMs only spit out things that has been written before might have been reasonable to believe a few years ago, but it hasn't been a reasonable position to hold for a long time at this point.
This premise is false. It is fundamentally equivalent to the claim that a language model being trained on a dataset: ["ABA", "ABB"] would be unable to generate, given input "B" the string "BAB" or "BAA".
Your claim here is slightly different.
You're claiming that if a token isn't supported, it can't be output [1]. But we can easily disprove this by adding minimal support for all tokens, making C appear in theory. Such support addition shows up all the time in AI literature [2].
[1]: https://en.wikipedia.org/wiki/Support_(mathematics)
[2]: In some regimes, like game theoretic learning, support is baked into the solving algorithms explicitly during the learning stage. In others, like reinforcement learning, its accomplished by making the policy a function of two objectives, one an exploration objective, another an exploitation objective. That existing cross pollination already occurs between LLMs in the pre-trained unsupervised regime and LLMs in the post-training fine-tuning via forms of reinforcement learning regime should cause someone to hesitate to claim that such support addition is unreasonable if they are versed in ML literature.
Edit:
Got downvoted, so I figure maybe people don't understand. Here is the simple counterexample. Consider an evaluator that gives rewards: F("AAC") = 1, all other inputs = 0. Consider a tokenization that defines "A", "B", "C" as tokens, but a training dataset from which the letter C is excluded but the item "AAA" is present.
After training "AAA" exists in the output space of the language model, but "AAC" does not. Without support, without exploration, if you train the language model against the reinforcement learning reward model of F, you might get no ability to output "C", but with support, the sequence "AAC" can be generated and give a reward. Now actually do this. You get a new language model. Since "AAC" was rewarded, it is now a thing within the space of the LLM outputs. Yet it doesn't appear in the training dataset and there are many reward models F for which no person will ever have had to output the string "AAC" in order for the reward model to give a reward for it.
It follows that "C" can appear even though "C" does not appear in the training data.
And secondly, what you say are false (at least if taken literally). I can create a new programming language, give the definition of it in the prompt, ask it to code something in my language, and expect something out. It might even work.
I literally just pointed out the same time without having seen your comment.
Second this. I've done this several times, and it can handle it well. Already GPT3.5 could easily reason about hypothetical languages given a grammar or a loose description.
I find it absolutely bizarre that people still hold on to this notion that these languages can't do anything new, because it feels implausible that they have tried given how well it works.
A lot because we use libraries for 'done frequently before' code. i don't generate a database driver for my webapp with llm.
But how much of enterprise programming is 'get some data from a database, show it on a Web page (or gui), store some data in the database', with variants?
It makes sense that we have libraries for abstraction away some common things. But it also makes sense that we can't abstract away everything we do multiple times, because at some point it just becomes so abstract that it's easier to write it yourself than to try to configure some library. Does not mean that it's not a variant of something done before.
Lots of programming doesn't have one specific right answer, but a bunch of possible right answers with different trade-offs. The programmers job isn't just to get working code neccesarily. I dont think we are at the point where llm's can see the forest for the trees, so to speak.
Set rules on what’s valid, which most languages already do; omit generation of known code; generate everything else
The computer does the work, programmers don’t have to think it up.
A typed language example to explain; generate valid func sigs
func f(int1, int2) return int{}
If that’s our only func sig in our starting set then it makes it obvious
Well relative to our tiny starter set func f(int1, int2, int3) return int{} is novel
This Redis post is about fixing a prior decision of a random programmer. A linguistics decision.
That’s why LLMs seem worse than programmers because we make linguistics decisions that fit social idioms.
If we just want to generate all the never before seen in this model code we don’t need a programmer. If we need to abide laws of a flexible language nature, that’s what a programmer is for; compose not just code by compliance with ground truth.
That antirez is good at Redis is a bias since he has context unseen by the LLM. Curious how well antirez would do with an entirely machine generated Redis-clone that was merely guided by experts. Would his intuition for Redis’ implementation be useful to a completely unknown implementation?
He’d make a lot of newb errors and need mentorship, I’m guessing.
Read the article; his younger self failed to see logic needed now. Add that onion peel. No such thing as perfect clairvoyance.
Even Yann LeCun’s energy based models driving robots have the same experience problem.
Make a computer that can observe all of the past and future.
Without perfect knowledge our robots will fail to predict some composition of space time before they can adapt.
So there’s no probe we can launch that’s forever and generally able to survive with our best guess when launched.
More people need to study physical experiments and physics and not the semantic rigor of academia. No matter how many ideas we imagine there is no violating physics.
Pop culture seems to have people feeling starship Enterprise is just about to launch from dry dock.
Programming has become vastly more efficient in terms of programmer effort over decades, but making some aspects of the job more efficient just means all your effort it spent on what didn’t improve.
no i don't remember that. They are doing similar things now that they did 3 yrs ago. They were still a decent rubber duck 3 yrs ago.