It's like self-driving cars. A car driving itself for the first time in a controlled environment, I'm sure, was an impressive feat, and it wouldn't be inaccurate to call it a self-driving car. However, that's not what we're all waiting for when we talk about the arrival of self-driving cars.
None of the self driving systems where setup by giving the AI access to sensors, a car, and the drivers handbook and saying well you figure it out from there. The general trend is solve this greatly simplified problem, this more complex one, up to dealing with the real world.
A few examples of neural program synthesis from at least 2 years ago:
https://sunblaze-ucb.github.io/program-synthesis/index.html
Another example from June 2020:
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
https://arxiv.org/abs/2006.08381
RobustFill, from 2017:
RobustFill: Neural Program Learning under Noisy I/O
https://www.microsoft.com/en-us/research/wp-content/uploads/...
I could go on.
And those are only examples from neural program synthesis. Program synthesis, in general, is a field that goes way back. I'd suggest as usual not making big proclamations about its state of the art without being acquainted with the literature. Because if you don't know what others have done every announcement by DeepMind, OpenAI et al seems like a huge advance... when it really isn't.
https://www.semanticscholar.org/paper/Program-Synthesis-from...
AlphaCode is not particularly good at it, either. In the arxiv preprint, besides the subjetive and pretty meaningless "evaluation" against human coders it's also tested on a formal program synthesis benchmark, the APPS dataset. The best performing AlphaCode variant reported in the arxiv preprint solves 25% of the "introductory" APPS tasks (the least challenging ones). All AlphaCode variants tested solve less than 10% of the "interview" and "competition" (intermediary and advanced) tasks. These more objective results are not reported in the article above, I think for obvious reasons (because they are extremely poor).
So it's not doing anything radically new and it's not doing it particularlly well either. Please be better informed before propagating hype.
Edit: really, from a technical point of view, AlphaCode is a brute-force, generate-and-test approach to program synthesis that was state-of-the-art 40 years ago. It's just a big generator that spams programs hoping it will hit a good one. I have no idea who came up with this. Oriol Vinyals is the last author and I've seen enough of that guy's work to know he knows better than bet on such a primitive, even backwards approach. I'm really shocked that this is DeepMind work.
So my hunch is that it probably hasn't been done, or hasn't been done often, because the program synthesis community would recognise it's pointless.
What you really want to look at is formal program synthesis benchmarks and how systems like AlphaCode do on them (hint: not so good).