undefined | Better HN

0 pointssimbolit2y ago0 comments

TLDR: transformer models (on gpt2 scale) are great (near-optimal) at interpolating between the cases given in (pre-)training, but as soon as we leave the training domain fail at extrapolation. Impressive results may be more due to the wide breadth of (pre-)training data, and less due to generalization ability.

0 comments

No comments yet.