Also, my understanding is that Dreamcoder does some fancy PL theory stuff to factorize blocks of code with identical behavior into functions. Honestly I think that’s the key advance in the paper, more than the wake-sleep algorithm they focus on.
Anyways the point was more that self supervised learning is quite applicable to learning to program. I think the downside is that the model learns its own weird, non-idiomatic conventions, rather than copying github.
Yes, it's possible to apply self-supervised learning to program synthesis, because it's possible to generate programs. It's possible to generate _infinite_ sets of programs. The problem is that if you make a generator with Universal Turing Machine expressivity, you're left with an intractable search over an infinite search space. And if you don't generate an infinite set of programs, then you 're left with an incomplete search over a space that may not include your target program. In the latter case you need to make sure that your generator can generate the programs you're looking for, which is possible, but it limits the approach to only generating certain kinds of programs. In the end, it's the easiest thing to create a generator for progams that you already know how to write- and no others. How useful is that is an open question. So far no artificial system has ever made an algorithmic contribution, to my knowledge, in the sense of coming up with a new algorithm for a problem for which we don't have good algorithms, or coming up with an algorithm for a problem we can't solve at all.
My perception is influenced by my studies, of course, but for me, a more promising approach than the generate-and-test approach exemplified by DreamCoder and AlphaCode etc. is Inductive Programming, which is to say, program synthesis from input-output examples only, without examples of _programs_ (the AlphaCode paper says that is an easier setting but I very disagree). Instead of generating a set of candidate programs and trying to find a program that agrees with the I/O examples, you have an inference procedure that generates _only_ the programs that agree with the I/O examples. In that case you don't need to hand-craft or learn a generator. But you do need to impose an inductive bias on the inference procedure that restricts the hypothesis language, i.e. the form of the programs that can be learned. And then you're back to worrying about infinite vs. incomplete search spaces. But there may be ways around that, ways not available to purely search-based systems.
Anyway program synthesis is a tough nut to crack and I don't think that language models can do the job, just like that. The work described in the article above, despite all the fanfare about "reasoning" and "critical thinking" is only preliminary and its results are not all that impressive. At least not yet. We shall see. After all, DeepMind has deep resources and they may yet surprise me.
- First, some more recent work, mostly overviews.
1. The following is the most recent overview of the field I'm aware of:
Inductive logic programming at 30 (Cropper et al, 2020)
https://www.doc.ic.ac.uk/~shm/Papers/ilp30.pdf
2. And a slightly shorter version of the same paper that summarises new trends:
Turning 30: New Ideas in Inductive Logic Programming (Cropper et al, 2020)
https://www.ijcai.org/Proceedings/2020/0673.pdf
3. Here's a short introduction to the relatively new ILP direction of learning Answer Set Programming:
Inductive Logic Programming in Answer Set Programming (Corapi et al, 2011)
https://link.springer.com/chapter/10.1007/978-3-642-31951-8_...
4. This is an overview of Meta-Interpretive Learning (MIL), a new approach to ILP that overcomes many difficulties of earlier approaches (Full disclosure: my own work is on MIL, though not the article linked):
Meta-Interpretive Learning: achievements and challenges (Stephen Muggleton, 2017)
https://www.doc.ic.ac.uk/~shm/Papers/rulemlabs.pdf
5. And this is a (short vesion) of a paper on δILP, a neural-net based ILP system:
Learning Explanatory Rules from Noisy Data (Evans and Grefenstette, 2018)
https://www.ijcai.org/Proceedings/2018/0792.pdf
- Next, some earlier work that is still relevant:
6. This is the inaugural paper of the field, that first named it (a little heavy reading though):
Inductive Logic Programming (Stephen Muggleton, 1990)
https://www.doc.ic.ac.uk/~shm/Papers/ilp.pdf
7. Here's an early paper on predicate invention, an important technique in ILP (only recently fully realised via MIL):
Predicate Invention in ILP - an Overview (Irene Stahl, 1993)
https://link.springer.com/chapter/10.1007%2F3-540-56602-3_14...
8. And an early overview of learning recursion (and performing predicate invention) that also lists several early ILP systems:
Inductive synthesis of recursive logic programs:achievements and prospects (Flener and Yilmaz, 1999)
https://core.ac.uk/download/pdf/82810434.pdf
That should be enough to get you started. I recommend reading in the order I linked to the various articles. I tried to give links to documents that I know can be read for free.
Unfortunately most of the material on ILP is either in scholarly articles, or, where there are textbooks, they tend to be older. That sounds bad, but there has been much new work recently with several new approaches.
Let me know if you're looking for more specific information. See my signature for contact details- I'm happy to answer emails about ILP :)