The experiment was fixed at 3 epochs on 1T tokens, they didn't decide to "stop" at a given criterion.
> we don’t know which without looking at validation loss, which is like a second set of test data the model hasn’t seen before.
The data I linked shows the validation loss, which has the same behavior as the training loss.