But still, why the difference? With or without the "lol", the data dependence is identical, the predictability is identical. Yet with the "lol" it optimises differently.
My best guess is that it's because of the added cost to the other branch: printing is expensive, so the compiler really doesn't want to do that, and would prefer to incorrectly predict not-printing than to incorrectly predict printing.
If that tradeoff is what causes the difference in behaviour here, then I understand. But I don't like it, because it's ridiculous and we shouldn't have to con the compiler into doing what's right. If we're going to have to do this, I'd rather have the language provide a way for us to make explicit what we expect here.