The issue is though the the line between in domain and out of domain is fuzzy. This sort of means that generalization is in a continum. Chatgpt has seen enough UI framework code that it can interpolate concepts. This is a form of generalization but people would be looking for a lot more. I guess a better way to check generalization capability is to train the model on just C++ and then see how much it can do stuff in python using only few shot examples.
Another important thing to keep in mind is one paper(wish I could remember which one it was) that showed even larger scale llms have trouble understanding that A=B is same as B=A if they have not seen A or B before