undefined | Better HN

0 pointswinwang1mo ago0 comments

Interestingly, I find that the models generalize decently well as long as the "training" (more analogous to that for humans) fits in (small enough) context. That's to say, "in-context learning" seems good enough for real use.

But of course, that's not quite "long term"

0 comments

fc417fc8021mo ago

Given that models don't currently learn as they go isn't that exactly what this benchmark is testing? If the model needs to either have been explicitly trained in a similar environment or else to have a human manually input a carefully crafted prompt then it isn't general. The latter case is a human tuning a powerful tool.

If it can add the necessary bits to its own prompt while working on the benchmark then it's generalizing.

j / k navigate · click thread line to collapse

0 comments

fc417fc8021mo ago

If it can add the necessary bits to its own prompt while working on the benchmark then it's generalizing.

j / k navigate · click thread line to collapse