undefined | Better HN

0 pointshuac1y ago0 comments

that comment refers to the test time inference, i.e. what the model is prompted with, not to what it is trained on. this is, of course, also a tricky problem (esp over long context, needle in a haystack), but it should be much easier than memorization.

anyways, another interpretation is that the model needs to also make a decision on if the code in the issue is a reliable fix or not too

0 comments

sebzim45001y ago

Then I don't understand what he's suggesting. It is obviously not the case that 1/3 of the questions int he SWE-bench dataset have the solution in as part of the issue that is provided to the model. You can just download it and look. The solution is likely in the training data though.

j / k navigate · click thread line to collapse

0 pointshuac1y ago0 comments

anyways, another interpretation is that the model needs to also make a decision on if the code in the issue is a reliable fix or not too

0 comments

sebzim45001y ago

j / k navigate · click thread line to collapse