I don't use Bayes factors in my research (2019) (opens in new tab)

(datacolada.org)

91 pointsjasonhansel2y ago75 comments

75 comments

33 comments · 12 top-level

p-e-w2y ago· 7 in thread

The statistical interpretation of observations is so subtle and complex that it's a good idea to assume that any publication from the empirical sciences is complete garbage, until you know for sure that a qualified statistician has supervised the process. A semester of "introduction to statistical methods" (which is all the background that most scientists have) is NOT enough.

Imagine a mathematician writing a paper on a medical topic, making all kinds of claims on how things work in the human body – and then that mathematician justifies their expertise by saying "I did a two-week first aid course once, and also, I was really good at biology in school". This is pretty much how lots of science operates when it comes to interpreting results mathematically.

concinds2y ago

I read a survey once, that found that a huge number of PhDs/researchers in the studied sample gave an incorrect definition for what a "95% confidence interval" (/p-value, etc) actually means, and that several popular introductory textbooks defined it incorrectly as well. Wish I bookmarked it.

At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.

constantcrying2y ago

I think you may be talking about "Mindless statistics" by Gigerenzer. He has some surveys about p-values and how radically wrong they are usually interpreted.

>At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.

Absolutely.

1 more reply

samch932y ago

probably this article: Hoekstra, R., Morey, R.D., Rouder, J.N. et al. Robust misinterpretation of confidence intervals. Psychon Bull Rev 21, 1157–1164 (2014). https://doi.org/10.3758/s13423-013-0572-3

another good article on misinterpretation of p-values and confidence intervals is: Greenland, S., Senn, S.J., Rothman, K.J. et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31, 337–350 (2016). https://doi.org/10.1007/s10654-016-0149-3

bakuninsbart2y ago

While I agree on the data point, it would kill so much research. It is bad that a lot of research validation basically comes down to "trust me guys", but with data being both very valuable and often times highly sensitive, it can be really difficult to just publish the data along with the research.

A decent compromise would be to at least require meta-data to sufficiently exclude some flaws. A different approach could be to have researchers document and publish the process of th research, similar to a git-repo with the main branch being completely off limits to history-rewriting.

blablabla1232y ago

On the other hand it seems there's also a lack of testing subjects. It's already frequently pointed out how medical results might not represent everyone. I would also assume that e.g. pharmaceutical certification processes do apply more sophisticated statistics.

p-e-w2y ago

The ugly truth is that lots of "science" being done today isn't actually science – it's a performance art that superficially imitates certain behaviors that are associated with real science.

And how could it be otherwise? There are nearly 10 million scientists in the world right now. And all of them are pushing out papers as fast as humanly possible. There isn't anywhere near enough statistical brainpower available to quality-control all of that. Not to mention that most people with sufficient expertise in statistics have better things to do than micromanaging science grad students who have a hard time comprehending Bayes' theorem.

1 more reply

KRAKRISMOTT2y ago

Ah, I see you have not met a biophysicist.

civilized2y ago· 4 in thread

Bayesian methods are not easy to use, I agree. But that's because they're trying to answer much more meaningful (but harder) questions than frequentist ones, which researchers should be trying to do. You can't ignore Bayesian epistemology just by not using Bayesian methods. The underlying considerations in the Bayesian framework will inevitably become relevant to how you interpret your data, whether or not you use a formal Bayesian method.

The thing is, formally, frequentist methods like Null Hypothesis Significance Testing don't tell you what you really want to know. If you get a significant p-value, that means the data you observed wouldn't often happen by chance (within your model of the null). This doesn't actually tell you if your particular hypothesis should be favored. That requires other considerations, including ones that Simonsohn is negative about in this article.

For example, Simonsohn's conclusion says:

> To use Bayes factors to test hypotheses: you need to be OK with the following two things:

> 1. Accepting the null when “the alternative” you consider, and reject, does not represent the theory of interest.

> 2. Rejecting a theory after observing an outcome that the theory predicts.

He implies that these should be points against Bayes factors. But #2 is something you actually should do sometimes. Demonstrably. If the data suggests a wildly implausible effect size that doesn't show up consistently in other analyses, that should be a point against your theory and in favor of some more mundane explanation, like noisy data from an underpowered study [1].

Not using Bayesian methods is understandable if you don't feel comfortable with the very heavy demands they can make on your statistics acumen. But if you're, say, a social scientist incentivized to get "sexy" results and you refuse to engage with Bayesian epistemology at all, your career will almost certainly just be a contribution of more noise publications to the replication crisis.

[1] http://www.stat.columbia.edu/~gelman/presentations/ziff.pdf

kkoncevicius2y ago

> Bayesian methods are not easy to use, I agree. But that's because they're trying to answer much more meaningful (but harder) questions than frequentist ones

I think your view about the difference between Frequentist and Bayesian methods is wrong. There is this rant I like from Larry Wasserman [1] on the subject:

  My opinions have shifted a bit. [...] Bayes-Frequentist debate still matters. And people — including many statisticians — are still confused about the distinction. I thought the basic Bayes-Frequentist debate was behind us. A year and a half of blogging (as well as reading other blogs) convinced me I was wrong here too. And this still does matter.
  My emphasis on high-dimensional models is germane, however. In our world of high-dimensional, complex models I can’t see how anyone can interpret the output of a Bayesian analysis in any meaningful way.
  I wish people were clearer about what Bayes is/is not and what frequentist inference is/is not. Bayes is the analysis of subjective beliefs but provides no frequency guarantees. Frequentist inference is about making procedures that have frequency guarantees but makes no pretense of representing anyone’s beliefs. In the high dimensional world, you have to choose: objective frequency guarantees or subjective beliefs. Choose whichever you prefer, but you can’t have both. I don’t care which one people pick; I just wish they would be clear about what they are giving up when they make their choice.
  [...]
  Of course, one can embrace objective Bayesian inference. If this means “Bayesian procedures with good frequentist properties” then I am all for it. But this is just frequentist inference in Bayesian clothing."

[1]: https://errorstatistics.com/2013/12/27/deconstructing-larry-...

civilized2y ago

Having a specific opinion isn't inherently a bias, and it's rather uninspiring to put effort into a substantial comment and then get a reply which is essentially nothing but a baseless accusation of bias followed by a long block quote of unclear relevance (it's extremely far from true that all statistical analyses worth consideration are in high dimensions).

Review the Hacker News guidelines:

> Edit out swipes.

https://news.ycombinator.com/newsguidelines.html

EDIT: much appreciated.

1 more reply

SiempreViernes2y ago

Honestly I can't tell how the quote is supposed to augment your point.

kgwgk2y ago

> But this is just frequentist inference in Bayesian clothing.

Or maybe this is "just" frequentist inference done right - on a Bayesian foundation.

Konohamaru2y ago· 4 in thread

Milton Friedman was correct: because the true minimum wage is $0.00 (unemployment), he was correct to compare wage increase to the null hypothesis. The potshot in the opening paragraph ("Milton feels bad about the unemployed but good about his theory.") is simultaneously an appeal to emotion and a presumptuous ad hominem.

travisjungroth2y ago

Potshot? It just seems like a joke, but one that puts this character (a nod to Milton Friedman but not like a serious insert) in a positive light. He’s pleased by being correct but sympathetic since he was right about something bad happening to people (unemployment).

systemvoltage2y ago

If anyone that’s taking potshots, it’s the author.

constantcrying2y ago

The point isn't that Friedman is right or wrong, but that the statistic model tells him to reject his hypothesis, even though he observed a result consistent with his hypothesis.

>is simultaneously an appeal to emotion and a presumptuous ad hominem.

I don't see how that is the case.

ggm2y ago

Because it makes Friedman heartless. He feels bad but he still promulgated theories which wreaked the badness he felt bad about. So it goes to character.

If he _really_ felt bad, he'd have done what Norbert Weiner did and move out of the field. He stayed an economist. Not so bad feeling, eh?

1 more reply

jldugger2y ago· 3 in thread

> Note: By theory I merely mean the rationale for investigating the effect of x on y. A theory can be as simple as “I think people value a mug more once they own it”.

Hoo boy, the [2019] is well deserved on this one -- that's a dan arielly reference from before The 2021 Accusation and before the recent NPR story refuting his excuse[1].

[1]: https://www.npr.org/2023/07/27/1190568472/dan-ariely-frances...

neonate2y ago

What is the reference? I don't get it.

jldugger2y ago

I could swear one of his research projects involved asking people to make a mug and pricing it out after they finished. But I guess it was Kahneman that researched mugs? Whoops

1 more reply

fsckboy2y ago

https://en.wikipedia.org/wiki/Dan_Ariely#Accusations_of_data...

travisjungroth2y ago· 2 in thread

I’ve been working a lot with Bayes factors lately. I don’t want to sound cultish, but I think part of the issue is this stuff doesn’t work “half way”. As soon as you’re talking about the null hypothesis and Bayes factors, you’re mixing up two schools of thought that don’t play nice.

Bayes factors work with comparing models. There is no null model. What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0. And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.

So, you need to pick two models and compare them. I’m not saying this is right for science. It’s working well for my purposes. One model meaning “as planned”, one model meaning “not as planned”, use the Bayes factor to decide if things are going as planned. But you do need to be explicit about what models you’re comparing. You have to be able to just put some data in and get a probability back, or it’s not going to work.

This is what makes this criticism of Bayes factors so unpersuasive. They’re very easy to calculate, but they’re never calculated here! It’s just the ratio of marginal likelihoods, the probability of the data under the model.

LudwigNagasena2y ago

> Bayes factors work with comparing models.

So does the traditional Neyman–Pearson hypothesis testing.

> There is no null model.

Why can’t there be?

> What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0%.

Well, if your null hypothesis is deterministic and says 0% effect, getting anything other than 0% absolutely will make you reject the null hypothesis. But most of the time hypotheses are not deterministic. Usually you sample random variables.

> And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.

Traditional hypothesis testing is a particular case of minimising risk, ie the expected value of your loss given possible models and your decision rule. You don’t assume any belief on the probability of a specific model to be true, thus I think it is incorrect to claim that you encode a belief. You don’t even claim that it can be measured.

Of course, that makes it impossible to quantify the risk over all possible models. Thus, you only deal with Type I and Type II errors, which values presumes that the null or alternative hypothesis is correct.

If you have a probability measure for models, you can simply average your risks over it and get what is known as Bayes risk. That would be encoding some belief.

travisjungroth2y ago

You've sliced up what I've said to the point it doesn't really make sense. This is exactly what I said was confusing. I'm talking about Bayes factors and you're talking about null hypothesis testing.

I'll just answer your question, why there can't be a null model. You can have a hypothesis that represents all differences between groups are due to chance. To make this a statistical model, something that can calculate the probability of an event, you have to make assumptions. Maybe it's just about the distribution. Maybe it's independence. But, it's always something. You said it yourself "You don’t assume any belief on the probability of a specific model to be true." To be a statistical model, to calculate the probability of an event, to calculate marginal likelihoods, to calculate Bayes factors, you have to do that.

This is largely a philosophical point. You can have a null model. Something you pick to represent "no effect". But there's not the null, this belief free model that's categorically different from a model with priors.

If there's a belief-free model that can give a marginal likelihood, then I'm wrong. I'd also very much like to know about it.

1 more reply

maxminminmax2y ago· 1 in thread

This is one of the most strawman (to put it mildly) things I have ever read.

constantcrying2y ago

Where is it?

edbaskerville2y ago

I screwed around with trying to compute Bayes factors for models of distributions over set partitions, having been led astray by Bayesian phylogenetic inference methods. It was a waste of time--in practice the epistemology was terrible because the choice of prior distributions had such a huge effect on model comparisons. On top of that, the computations were highly unstable so I had to do a lot of fancy multi-temperature MCMC stuff that never quite worked.

Unless your priors are based on actual observations, stick with model selection approaches that are based on measured predictive power, or at least plausible approximations thereof, e.g. Aki Vehtari et al. LOO-CV (approximate leave-one-out cross-validation):

https://avehtari.github.io/modelselection/

https://mc-stan.org/loo/

kgwgk2y ago

> wait until you understand Bayes Factors

I'm not sure that piece will help people to understand Bayes Factors: https://statmodeling.stat.columbia.edu/2019/09/10/i-hate-bay...

> In social science, theory alone will not deliver one [hypothesis to test]

I guess it's difficult to test a hypothesis when you don't really have one.

MrManatee2y ago

If the minimum wage is increased $4, the competing explanations seem to be:

1. Change in unemployment is normally distributed with mean 0% and standard deviation 0.606%.

2. Change in unemployment is uniformly distributed between 1% and 10%.

I don't really agree that "(1) vs (2)" is a particularly good formulation of the original question ("Would raising the minimum wage by $4 lead to greater unemployment?"). But if it were, how would the math work out?

If we observe that unemployment increases 1%, then yes, that piece of evidence is very slightly in favor of explanation (1). This doesn't feel weird or paradoxical to me. But surely we wouldn't want to decide the matter based just on that one inconclusive data point? Instead we would want to look at another instance of the same situation. If in that case an increase of, say, 6% would (almost) conclusively settle the matter in favor of (2), and an increase of, say, 0.8% would (absolutely) conclusively settle the matter in favor of (1).

fridental2y ago

So you have just one data point and you want to do statistics about it? No matter what you do, the results won't be useful.

In Bayesian approach, you start with some distribution that is a wild guess and doesn't even need to base on any knowledge besides of the basics how money work and that unemployment cannot be 0% or 100%. Each data point will refine your distribution until at some dataset size, it will converge to something estimating the reality.

You might want to watch an amazingly helpful introduction by Richard McElreath here https://www.youtube.com/watch?v=guTdrfycW2Q

tpoacher2y ago

p-values aren't problematic. How people use them is.

Same with bayes factors. I've seen people claim "anything above 3 is significant".

Incidentally, the theory behind p-values is actually beautiful, and p-values can generalise really well in theory, but in practice most people don't know this.

E.g., did you know that you can have "bayesian" p-values? (in the sense that the p-value can be designed to take priors and other models into account, without violating its definition in any way)

frankreyes2y ago

Put the minimum wage at $1,000,000.

Then, watch unemployment go up.

j / k navigate · click thread line to collapse

75 comments

33 comments · 12 top-level

p-e-w2y ago· 7 in thread

concinds2y ago

At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.

constantcrying2y ago

I think you may be talking about "Mindless statistics" by Gigerenzer. He has some surveys about p-values and how radically wrong they are usually interpreted.

>At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.

Absolutely.

1 more reply

samch932y ago

bakuninsbart2y ago

blablabla1232y ago

p-e-w2y ago

The ugly truth is that lots of "science" being done today isn't actually science – it's a performance art that superficially imitates certain behaviors that are associated with real science.

1 more reply

KRAKRISMOTT2y ago

Ah, I see you have not met a biophysicist.

civilized2y ago· 4 in thread

For example, Simonsohn's conclusion says:

> To use Bayes factors to test hypotheses: you need to be OK with the following two things:

> 1. Accepting the null when “the alternative” you consider, and reject, does not represent the theory of interest.

> 2. Rejecting a theory after observing an outcome that the theory predicts.

[1] http://www.stat.columbia.edu/~gelman/presentations/ziff.pdf

kkoncevicius2y ago

> Bayesian methods are not easy to use, I agree. But that's because they're trying to answer much more meaningful (but harder) questions than frequentist ones

I think your view about the difference between Frequentist and Bayesian methods is wrong. There is this rant I like from Larry Wasserman [1] on the subject:

  My opinions have shifted a bit. [...] Bayes-Frequentist debate still matters. And people — including many statisticians — are still confused about the distinction. I thought the basic Bayes-Frequentist debate was behind us. A year and a half of blogging (as well as reading other blogs) convinced me I was wrong here too. And this still does matter.
  My emphasis on high-dimensional models is germane, however. In our world of high-dimensional, complex models I can’t see how anyone can interpret the output of a Bayesian analysis in any meaningful way.
  I wish people were clearer about what Bayes is/is not and what frequentist inference is/is not. Bayes is the analysis of subjective beliefs but provides no frequency guarantees. Frequentist inference is about making procedures that have frequency guarantees but makes no pretense of representing anyone’s beliefs. In the high dimensional world, you have to choose: objective frequency guarantees or subjective beliefs. Choose whichever you prefer, but you can’t have both. I don’t care which one people pick; I just wish they would be clear about what they are giving up when they make their choice.
  [...]
  Of course, one can embrace objective Bayesian inference. If this means “Bayesian procedures with good frequentist properties” then I am all for it. But this is just frequentist inference in Bayesian clothing."

[1]: https://errorstatistics.com/2013/12/27/deconstructing-larry-...

civilized2y ago

Review the Hacker News guidelines:

> Edit out swipes.

https://news.ycombinator.com/newsguidelines.html

EDIT: much appreciated.

1 more reply

SiempreViernes2y ago

Honestly I can't tell how the quote is supposed to augment your point.

kgwgk2y ago

> But this is just frequentist inference in Bayesian clothing.

Or maybe this is "just" frequentist inference done right - on a Bayesian foundation.

Konohamaru2y ago· 4 in thread

travisjungroth2y ago

systemvoltage2y ago

If anyone that’s taking potshots, it’s the author.

constantcrying2y ago

The point isn't that Friedman is right or wrong, but that the statistic model tells him to reject his hypothesis, even though he observed a result consistent with his hypothesis.

>is simultaneously an appeal to emotion and a presumptuous ad hominem.

I don't see how that is the case.

ggm2y ago

Because it makes Friedman heartless. He feels bad but he still promulgated theories which wreaked the badness he felt bad about. So it goes to character.

If he _really_ felt bad, he'd have done what Norbert Weiner did and move out of the field. He stayed an economist. Not so bad feeling, eh?

1 more reply

jldugger2y ago· 3 in thread

> Note: By theory I merely mean the rationale for investigating the effect of x on y. A theory can be as simple as “I think people value a mug more once they own it”.

Hoo boy, the [2019] is well deserved on this one -- that's a dan arielly reference from before The 2021 Accusation and before the recent NPR story refuting his excuse[1].

[1]: https://www.npr.org/2023/07/27/1190568472/dan-ariely-frances...

neonate2y ago

What is the reference? I don't get it.

jldugger2y ago

I could swear one of his research projects involved asking people to make a mug and pricing it out after they finished. But I guess it was Kahneman that researched mugs? Whoops

1 more reply

fsckboy2y ago

https://en.wikipedia.org/wiki/Dan_Ariely#Accusations_of_data...

travisjungroth2y ago· 2 in thread

LudwigNagasena2y ago

> Bayes factors work with comparing models.

So does the traditional Neyman–Pearson hypothesis testing.

> There is no null model.

Why can’t there be?

> What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0%.

> And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.

If you have a probability measure for models, you can simply average your risks over it and get what is known as Bayes risk. That would be encoding some belief.

travisjungroth2y ago

You've sliced up what I've said to the point it doesn't really make sense. This is exactly what I said was confusing. I'm talking about Bayes factors and you're talking about null hypothesis testing.

If there's a belief-free model that can give a marginal likelihood, then I'm wrong. I'd also very much like to know about it.

1 more reply

maxminminmax2y ago· 1 in thread

This is one of the most strawman (to put it mildly) things I have ever read.

constantcrying2y ago

Where is it?

edbaskerville2y ago

https://avehtari.github.io/modelselection/

https://mc-stan.org/loo/

kgwgk2y ago

> wait until you understand Bayes Factors

I'm not sure that piece will help people to understand Bayes Factors: https://statmodeling.stat.columbia.edu/2019/09/10/i-hate-bay...

> In social science, theory alone will not deliver one [hypothesis to test]

I guess it's difficult to test a hypothesis when you don't really have one.

MrManatee2y ago

If the minimum wage is increased $4, the competing explanations seem to be:

1. Change in unemployment is normally distributed with mean 0% and standard deviation 0.606%.

2. Change in unemployment is uniformly distributed between 1% and 10%.

fridental2y ago

So you have just one data point and you want to do statistics about it? No matter what you do, the results won't be useful.

You might want to watch an amazingly helpful introduction by Richard McElreath here https://www.youtube.com/watch?v=guTdrfycW2Q

tpoacher2y ago

p-values aren't problematic. How people use them is.

Same with bayes factors. I've seen people claim "anything above 3 is significant".

Incidentally, the theory behind p-values is actually beautiful, and p-values can generalise really well in theory, but in practice most people don't know this.

E.g., did you know that you can have "bayesian" p-values? (in the sense that the p-value can be designed to take priors and other models into account, without violating its definition in any way)

frankreyes2y ago

Put the minimum wage at $1,000,000.

Then, watch unemployment go up.

j / k navigate · click thread line to collapse