Apologies if this is old news to everyone, but perhaps the hive mind knows the answer. I was watching a youtube video "The most complex model we actually understand" by Welch Labs and heard the story about the researcher who left a model training when going on vacation, which then learned to generalize after thousands of training steps. But when I try to look up the name of the discoverer it has not been made public, which seems a shabby way to treat someone. What's the real story?
Who discovered grokking and why is the name hard to find? | Better HN
Yes I did find that paper, I did not find which one of the 5 authors it was, or someone not listed as an author. The word 'vacation' is not in the paper.
https://arxiv.org/pdf/2201.02177.