Imagine a mathematician writing a paper on a medical topic, making all kinds of claims on how things work in the human body – and then that mathematician justifies their expertise by saying "I did a two-week first aid course once, and also, I was really good at biology in school". This is pretty much how lots of science operates when it comes to interpreting results mathematically.
At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.
>At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.
Absolutely.
another good article on misinterpretation of p-values and confidence intervals is: Greenland, S., Senn, S.J., Rothman, K.J. et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31, 337–350 (2016). https://doi.org/10.1007/s10654-016-0149-3
A decent compromise would be to at least require meta-data to sufficiently exclude some flaws. A different approach could be to have researchers document and publish the process of th research, similar to a git-repo with the main branch being completely off limits to history-rewriting.
And how could it be otherwise? There are nearly 10 million scientists in the world right now. And all of them are pushing out papers as fast as humanly possible. There isn't anywhere near enough statistical brainpower available to quality-control all of that. Not to mention that most people with sufficient expertise in statistics have better things to do than micromanaging science grad students who have a hard time comprehending Bayes' theorem.
The thing is, formally, frequentist methods like Null Hypothesis Significance Testing don't tell you what you really want to know. If you get a significant p-value, that means the data you observed wouldn't often happen by chance (within your model of the null). This doesn't actually tell you if your particular hypothesis should be favored. That requires other considerations, including ones that Simonsohn is negative about in this article.
For example, Simonsohn's conclusion says:
> To use Bayes factors to test hypotheses: you need to be OK with the following two things:
> 1. Accepting the null when “the alternative” you consider, and reject, does not represent the theory of interest.
> 2. Rejecting a theory after observing an outcome that the theory predicts.
He implies that these should be points against Bayes factors. But #2 is something you actually should do sometimes. Demonstrably. If the data suggests a wildly implausible effect size that doesn't show up consistently in other analyses, that should be a point against your theory and in favor of some more mundane explanation, like noisy data from an underpowered study [1].
Not using Bayesian methods is understandable if you don't feel comfortable with the very heavy demands they can make on your statistics acumen. But if you're, say, a social scientist incentivized to get "sexy" results and you refuse to engage with Bayesian epistemology at all, your career will almost certainly just be a contribution of more noise publications to the replication crisis.
[1] http://www.stat.columbia.edu/~gelman/presentations/ziff.pdf
I think your view about the difference between Frequentist and Bayesian methods is wrong. There is this rant I like from Larry Wasserman [1] on the subject:
My opinions have shifted a bit. [...] Bayes-Frequentist debate still matters. And people — including many statisticians — are still confused about the distinction. I thought the basic Bayes-Frequentist debate was behind us. A year and a half of blogging (as well as reading other blogs) convinced me I was wrong here too. And this still does matter.
My emphasis on high-dimensional models is germane, however. In our world of high-dimensional, complex models I can’t see how anyone can interpret the output of a Bayesian analysis in any meaningful way.
I wish people were clearer about what Bayes is/is not and what frequentist inference is/is not. Bayes is the analysis of subjective beliefs but provides no frequency guarantees. Frequentist inference is about making procedures that have frequency guarantees but makes no pretense of representing anyone’s beliefs. In the high dimensional world, you have to choose: objective frequency guarantees or subjective beliefs. Choose whichever you prefer, but you can’t have both. I don’t care which one people pick; I just wish they would be clear about what they are giving up when they make their choice.
[...]
Of course, one can embrace objective Bayesian inference. If this means “Bayesian procedures with good frequentist properties” then I am all for it. But this is just frequentist inference in Bayesian clothing."
[1]: https://errorstatistics.com/2013/12/27/deconstructing-larry-...Review the Hacker News guidelines:
> Edit out swipes.
https://news.ycombinator.com/newsguidelines.html
EDIT: much appreciated.
Or maybe this is "just" frequentist inference done right - on a Bayesian foundation.
>is simultaneously an appeal to emotion and a presumptuous ad hominem.
I don't see how that is the case.
If he _really_ felt bad, he'd have done what Norbert Weiner did and move out of the field. He stayed an economist. Not so bad feeling, eh?
Hoo boy, the [2019] is well deserved on this one -- that's a dan arielly reference from before The 2021 Accusation and before the recent NPR story refuting his excuse[1].
[1]: https://www.npr.org/2023/07/27/1190568472/dan-ariely-frances...
Bayes factors work with comparing models. There is no null model. What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0. And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.
So, you need to pick two models and compare them. I’m not saying this is right for science. It’s working well for my purposes. One model meaning “as planned”, one model meaning “not as planned”, use the Bayes factor to decide if things are going as planned. But you do need to be explicit about what models you’re comparing. You have to be able to just put some data in and get a probability back, or it’s not going to work.
This is what makes this criticism of Bayes factors so unpersuasive. They’re very easy to calculate, but they’re never calculated here! It’s just the ratio of marginal likelihoods, the probability of the data under the model.
So does the traditional Neyman–Pearson hypothesis testing.
> There is no null model.
Why can’t there be?
> What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0%.
Well, if your null hypothesis is deterministic and says 0% effect, getting anything other than 0% absolutely will make you reject the null hypothesis. But most of the time hypotheses are not deterministic. Usually you sample random variables.
> And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.
Traditional hypothesis testing is a particular case of minimising risk, ie the expected value of your loss given possible models and your decision rule. You don’t assume any belief on the probability of a specific model to be true, thus I think it is incorrect to claim that you encode a belief. You don’t even claim that it can be measured.
Of course, that makes it impossible to quantify the risk over all possible models. Thus, you only deal with Type I and Type II errors, which values presumes that the null or alternative hypothesis is correct.
If you have a probability measure for models, you can simply average your risks over it and get what is known as Bayes risk. That would be encoding some belief.
I'll just answer your question, why there can't be a null model. You can have a hypothesis that represents all differences between groups are due to chance. To make this a statistical model, something that can calculate the probability of an event, you have to make assumptions. Maybe it's just about the distribution. Maybe it's independence. But, it's always something. You said it yourself "You don’t assume any belief on the probability of a specific model to be true." To be a statistical model, to calculate the probability of an event, to calculate marginal likelihoods, to calculate Bayes factors, you have to do that.
This is largely a philosophical point. You can have a null model. Something you pick to represent "no effect". But there's not the null, this belief free model that's categorically different from a model with priors.
If there's a belief-free model that can give a marginal likelihood, then I'm wrong. I'd also very much like to know about it.
Unless your priors are based on actual observations, stick with model selection approaches that are based on measured predictive power, or at least plausible approximations thereof, e.g. Aki Vehtari et al. LOO-CV (approximate leave-one-out cross-validation):
I'm not sure that piece will help people to understand Bayes Factors: https://statmodeling.stat.columbia.edu/2019/09/10/i-hate-bay...
> In social science, theory alone will not deliver one [hypothesis to test]
I guess it's difficult to test a hypothesis when you don't really have one.
1. Change in unemployment is normally distributed with mean 0% and standard deviation 0.606%.
2. Change in unemployment is uniformly distributed between 1% and 10%.
I don't really agree that "(1) vs (2)" is a particularly good formulation of the original question ("Would raising the minimum wage by $4 lead to greater unemployment?"). But if it were, how would the math work out?
If we observe that unemployment increases 1%, then yes, that piece of evidence is very slightly in favor of explanation (1). This doesn't feel weird or paradoxical to me. But surely we wouldn't want to decide the matter based just on that one inconclusive data point? Instead we would want to look at another instance of the same situation. If in that case an increase of, say, 6% would (almost) conclusively settle the matter in favor of (2), and an increase of, say, 0.8% would (absolutely) conclusively settle the matter in favor of (1).
In Bayesian approach, you start with some distribution that is a wild guess and doesn't even need to base on any knowledge besides of the basics how money work and that unemployment cannot be 0% or 100%. Each data point will refine your distribution until at some dataset size, it will converge to something estimating the reality.
You might want to watch an amazingly helpful introduction by Richard McElreath here https://www.youtube.com/watch?v=guTdrfycW2Q
Same with bayes factors. I've seen people claim "anything above 3 is significant".
Incidentally, the theory behind p-values is actually beautiful, and p-values can generalise really well in theory, but in practice most people don't know this.
E.g., did you know that you can have "bayesian" p-values? (in the sense that the p-value can be designed to take priors and other models into account, without violating its definition in any way)
Then, watch unemployment go up.