that comment refers to the test time inference, i.e. what the model is prompted with, not to what it is trained on. this is, of course, also a tricky problem (esp over long context, needle in a haystack), but it should be much easier than memorization.
anyways, another interpretation is that the model needs to also make a decision on if the code in the issue is a reliable fix or not too