undefined | Better HN

0 pointsjules11y ago0 comments

Exactly! The problems arise because of the disconnect between what the math is actually saying and what people think the math is saying. Or rather: what people wish it was saying. Frequentist methods give you "if page A performs the same as page B then then the likelihood of observing something at least as extreme as this measurement is less than X%". In practice we never want to know this information. What people actually want to know is "given this measurement, the probability of page A being better than page B is X%", so they interpret whatever number comes out of the frequentist method like that...wishful thinking.

Just give them 2 posterior distributions of the conversion rate of page A and page B. It may look more daunting than a single number at first, but it's much easier to interpret than that single number that comes out of hypothesis testing, and, you know, it's the information they actually need to make a decision whether to pick page A or page B.

0 comments

Machow11y ago

"given this measurement and our prior beliefs, the probability of page A being better than page B is X%"

FTFY ;). I think Bayesian methods add a lot of interpretive power, but I'm not sure that it would help people make a correct interpretation. I suspect that if practitioners are neglecting the difference between a one-sided and two-sided test, they will likely forget (or gloss over) what priors are (and their non-trivial implementation).

I definitely agree that their is a disconnect between the math and its interpretation, though.

julesOP11y ago

In an A/B test where you usually get so much data, priors honestly don't matter much. Just use a flat prior. You'll overestimate the uncertainty a bit, so you may need a couple more data points than necessary but it's still way less than you'd need for a frequentist method. An A/B testing company could even automatically come up with better priors based on A/B tests that their customers have done in the past.

yummyfajitas11y ago

Even in the Bayesian case, you need more than 2 posteriors. You need a decision rule. Comparing posteriors is not sufficient.

http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html

julesOP11y ago

You can just show the posterior and let your brain be the decision rule. You can visually see the difference in conversion rate and the uncertainty around it. That info makes it easy to decide whether to continue the test or stop the test and pick the best performer. Much better information to base a decision on than a hypothesis test with a significance threshold that people pull out of their ass.

If you want to be fancy you could even implement a strategy that maximizes the total conversions based on bayesian decision theory, so that it automatically tends to show the best performer as time goes on.

That article is weird. It uses a normal distribution as the prior for the conversion rate. That could produce a negative conversion rate or a conversion rate above 100%. Then in the section "So why doesn’t everyone already do this?" they say "The answer is simple - it’s computationally inefficient.". No shit if you are using a normal prior. A much better way to do this is to use a beta prior (or a Dirichlet prior in case you have more than 2 alternatives). Then the math becomes trivial & fast and you don't have nonsense negative or above 100% conversion rates.

yummyfajitas11y ago

I didn't say hypothesis test, I said decision rule. The method I describe in the article has only two quantities "pulled out of the ass" - the threshold of caring and the prior. If you visually inspect the posterior, your are implicitly pulling out of your ass an unknown "threshold of visual similarity".

That article is weird. It uses a normal distribution as the prior for the conversion rate.

That's incorrect. From the article: "To begin we will choose a Beta distribution prior." The computational intensiveness is not caused by the choice of prior, it's caused by the need to evaluate an integral over the joint posterior.

A Dirichlet prior is also not what you'd use for more than 2 alternatives - you have two beta distributions, one representing the posterior for the control and the other for the variation. If you had a second variation, you'd have 3 beta distributions, and you'd need to evaluate a 3 dimensional integral.

1 more reply

j / k navigate · click thread line to collapse

0 pointsjules11y ago0 comments

0 comments

Machow11y ago

"given this measurement and our prior beliefs, the probability of page A being better than page B is X%"

I definitely agree that their is a disconnect between the math and its interpretation, though.

julesOP11y ago

yummyfajitas11y ago

Even in the Bayesian case, you need more than 2 posteriors. You need a decision rule. Comparing posteriors is not sufficient.

http://www.bayesianwitch.com/blog/2014/bayesian_ab_test.html

julesOP11y ago

yummyfajitas11y ago

That article is weird. It uses a normal distribution as the prior for the conversion rate.

1 more reply

j / k navigate · click thread line to collapse