Kludgey is right; and it won't always be accurate: numbers might happen to agree on a too-small sample; or diverge on a large-enough sample.
But I think it's a very simple, concrete and credible demonstration to users. And if they observe such an expt over time, over a few trials, they'll quickly develop an intuition for when the sample size is large enough. And this is more valuable than an abstract calculation. (Of course, ideal to have both.)
Unfortunately, the empirical approach doesn't take into account the effect of how near the probability is to 50% vs the extremes 0% or 100%; nor of the variance of the population. However, in practice, these are probably similar enough over all ads and populations for the effect to be negligible - especially if you give yourself a margin of safety.