But when I say alpha code/copilot is good I’m referring solely to the difficulty of problems they are doing. There are many papers including mine that worked on simpler problems with more structure used to work on them.
I expect follow up work will include actually incorporating other knowledge more heavily to the model. My work was mainly on restricting tree like models to only make predictions following grammar of the language. Does that parallelize/fit well with a transformer? Unsure, but I would expect some language information/genuine problem constraints to be incorporated in future work.
Honestly I am pretty surprised how far pure brute force with large model is going. I would not have expected gpt3 level language modeling from more scale on a transformer and little else.