Redefine statistical significance (opens in new tab)

(nature.com)

303 pointsarstin8y ago114 comments

114 comments

79 comments · 19 top-level

gattilorenz8y ago· 9 in thread

It can hardly hurt, but it is still a stop gap measure. It won't solve the publication bias people will still change the hypothesis or the test after measurements are done.

I think the situation would improve with better teaching of philosophy of science and statistics (this would educate better reviewers too).

agentofoblivion8y ago

Agree. It doesn't stop p-hacking, it just makes it harder. Definitely treating the symptom instead of the disease. We ultimately need institutional and cultural change, but it's not obvious how to do that in the short term, so making it harder to claim significance might be a step in the right direction.

On the other hand, you might expect that new discoveries by nature have less data since data is likely more expensive for brand new research, and by extension a lower likelihood of meeting these sorts of stringent statistical requirements. Decreasing the p-value threshold may be counter-productive if we dismiss legitimate new discoveries due to essentially economic constraints with data gathering, which would have the impact of making it less likely to get funding to pursue the problem in more depth, thereby slowing the advance of discoveries.

jjoonathan8y ago

> It doesn't stop p-hacking, it just makes it harder.

I could see the reverse happening, where higher p-value standards lead to normalization of deviance in the form of worse p-hacking.

1 more reply

epistasis8y ago

It can hurt, in that it can slow the spread of information. If you perform 70% fewer different types of experiments because you have to hit p=0.005 instead of p=0.05, then you explore in fewer directions.

This is a classic tradeoff between exploration and exploitation in active learning.

If your view of the world is that there are only a very few hypotheses worth exploring, and you have a good lay of the scientific land, then requiring higher bar of proof is probably good.

If it's a new field that's extremely complex and where very little is known of the governing principles, then requiring very high stats could severely slow progress and waste lots of research dollars.

I completely agree that rather than setting arbitrary barriers for significance, it would seem much better to let people actually understand what was found, at whatever significance it was. Even setting up the null model to get a p-value requires tons and assumptions. The better test is reproducibility and predictive models that can be validated or invalidated. That's where the science is, and not in the p.

nonbel8y ago

> "It can hurt, in that it can slow the spread of information."

I am not at all in favor of this proposal, but one thing it may do is stem the tidal wave of misinformation.

btilly8y ago

Yours is a theoretical concern.

The very practical concern is that entire areas of research have been based on studies replicated and backed up entirely through p-hacking and selectively publishing only papers with positive results. This is a proven issue today. See https://en.wikipedia.org/wiki/Replication_crisis for more.

It may be that there is a pendulum that needs to swing a few times to get to a good tradeoff. But it is clear, now, which direction it needs to swing.

1 more reply

gboudrias8y ago

> It won't solve the publication bias people will still change the hypothesis or the test after measurements are done.

As a Psychology student, this is a well-known initiative: https://cos.io/prereg/

(Though I can't confirm or deny its widespread usage.)

The publication bias is harder, and pre-registration won't solve this. But I think this is a separate issue, and it's important to address each issue in its own right.

I've seen the proposal from TFA before and with my very limited knowledge, I'm still fairly certain it will never come to pass in Psychology, as nearly half of all modern studies have reproducibility issues (!). It would be beneficial to our field, in the way that a band-aid is beneficial to a gaping wound, but it would require a lot more rigor than has been evidently been displayed so far (and more rigor is more work, and time is limited).

So... Don't hold your breath.

(Sorry if my comment sounds pessimistic, I don't know much and I'm open to being corrected. I still have enough critical thought to be skeptical of some researchers' dedication to intellectual rigor.)

BeetleB8y ago

>I think the situation would improve with better teaching of philosophy of science and statistics (this would educate better reviewers too).

This is necessary, but not sufficient. What's needed is a way to know for sure that the hypothesis was not changed after data collection. I think predeclaring the hypothesis is the way to go.

gattilorenz8y ago

Yeah, but at the end you can still fabricate the data, remove "outliers",... Plus it's almost impossible to imagine a world where, before any experiment in any field, you predeclare it.

Not that education can fix all these (you can't prevent evil), but if reviewers and journals and conferences started to accept more the negative results, the incentive in lying would quickly decrease. And people would probably start to "disprove" interesting theories, instead of trying to "prove" niche results...

1 more reply

setzer228y ago

Genuine curiosity here. What's wrong with making an experiment, and when the results clearly contradict your initial assumptions (so much that the opposite is confirmed) then publishing the found results?

3 more replies

jimmar8y ago· 8 in thread

Not a huge fan of this idea. For example, people who analyze twitter data can get very small p-values because they analyze millions of tweets even though the effects they find are very small. See https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1336700

moultano8y ago

I'd rather hear about small things that are true than large things that are false.

cropsieboss8y ago

The things is that these small things might just be noise from some confounding factors.

http://jaoa.org/article.aspx?articleid=2517494

For example, here sample size is huge, USA population gets significant increased risk, while EU population does not. Mixing the two together would result in a smaller but still significant increased risk.

Given the size, it's quite clear that USA population has many other confounding factors that cannot be eliminated by mathematics alone (there is no control).

nonbel8y ago

Statistical significance doesn't mean an effect is "true", or "real". So getting rid if it shouldn't cause you any issues.

stewbrew8y ago

The problem here is that you most likely hear about false positives.

amelius8y ago

I don't see the problem as long as you clearly separate significance and effect-size.

seanwilson8y ago

This bothers me a lot of media reporting. The headline will be something like "X is good/bad for you" with a tiny effect size but the way it's reported makes you assume it's a large effect size. Usually the effect size won't be discussed in any depth or at all, they just want to sum it up in black and white. If the effect size is tiny it's probably just noise that would disappear in a higher quality future study.

mhermher8y ago

This is why I think better multiple-test-correction methods are more important and would lead better to a desirable outcome than just lowering alpha.

stewbrew8y ago

Multiple-test correction usually implies lowering alpha, doesn't it?

1 more reply

tw10108y ago· 7 in thread

This still doesn't feel satisfying. Part of me is still not really happy with the philosophical foundations of statistics. Does anyone know of any legitimately competing theory to statistics? Maybe something that doesn't rest on the same types of mathematics that Fisher and crew relied on when all this started? Pure mathematics has come a long way in the last fifty years but little has seeped into the applied world.

1688y ago

How & why exactly are you unhappy with statistics?

robterrin8y ago

Uh oh. If you don't watch out you'll end up a Bayesian. http://www.stat.columbia.edu/~gelman/research/unpublished/p_...

Is this along the lines of what you were hoping to find? Here's more: http://andrewgelman.com/2016/12/13/bayesian-statistics-whats...

drabiega8y ago

It's likely that the problems with statistics are inherently due to the nature of knowledge, so alternative formulations are not likely to help much.

scryder8y ago

Statistics arises from a set of axioms, assumed truths, which can be used to prove all other things in the field.

You can take a look at the three axioms people use to justify statistics. If you are willing to accept them, all else that relies on them (without using new axioms) must be true:

https://en.wikipedia.org/wiki/Probability_axioms

This same logic is used to justify development in pure mathematics: choose a set of axioms which you accept as ground truths, and prove things using them. As long as you are unable to prove your axioms are contradictory, and the axiom choice seems acceptable, then the work that you've done (with respect to them) is philosophically justified.

tw10108y ago

Statistics and probability are different things. I'm fine with the foundations of probability.

1 more reply

BeetleB8y ago

Please don't treat probability and statistics as one.

tnecniv8y ago

> Part of me is still not really happy with the philosophical foundations of statistics.

You mean you aren't happy with...probability?

imh8y ago· 6 in thread

>For a wide range of common statistical tests, transitioning from a P value threshold of α = 0.05 to α = 0.005 while maintaining 80% power would require an increase in sample sizes of about 70%.

This proposal is a great pragmatic step forward. Like they say in the paper, it doesn't solve all problems, but it would be an improvement with reasonable cost and tremendous benefits.

>Such an increase means that fewer studies can be conducted using current experimental designs and budgets. But Fig. 2 shows the benefit: false positive rates would typically fall by factors greater than two. Hence, considerable resources would be saved by not performing future studies based on false premises.

stdbrouw8y ago

In some fields like psychology, power is more likely to already be 10% or 20% for the majority of studies, and in fact P-hacking and low standards for evidence would be far less harmful if power were higher, because low power leads to inflated effect size estimates. Additionally, power calculations are always just a guess and easy to fudge, so it's pretty much a given that current statistical power would not be maintained with more stringent critical values. See http://andrewgelman.com/2014/11/17/power-06-looks-like-get-u...

So this proposal is really the opposite of pragmatic. Pragmatic would be requiring effect size estimates and confidence intervals in all published papers. It is surprising how many papers will talk about highly significant effects without actually discussing how large the estimated effect is thought to be, which gives authors a lot of leeway when exaggerating the importance of their findings.

maxerickson8y ago

So ultimately the issue is that push button statistics don't work?

1 more reply

reilly30008y ago

The cost of increasing sample size is significant; this is a trade-off that allows smaller projects to still conduct valuable research.

1 more reply

autokad8y ago

almost all problems has to do with data and selection of data. changing the p-value threshold wouldn't help.

coverband8y ago

I upvoted you in principle, but working with a tighter threshold would also make choosing self-serving data samples more obvious, if not more difficult.

scottlocklin8y ago

The real solution is social scientists using statistics properly. Data mining 0.005 p-values isn't practically speaking much harder than 0.05. And some of these clowns are actually data mining on purpose; I've listened to them talking about it.

aheilbut8y ago· 6 in thread

No one in biology would be able to publish anything.

nonbel8y ago

There are so many problems with this:

1) The p-value filter leads to publication bias.

-You should publish your results anyway, or the study wasn't designed/performed correctly. The raw data and description of methods should be valuable.

2) The null hypothesis is (almost) always false anyway.

-Everything in bio/psych/etc has a real (not spurious) non-zero correlation with everything else, so the significance level just determines how much data needs to be collected to reject it.

3) Rejection or not of the null hypothesis does not indicate whether the theory/explanation of interest is correct, so is inappropriate for deciding whether a result is interesting to begin with.

-Usually the null hypothesis is very precise and the "alternative statistical hypothesis" that maps to the research hypothesis is very vague, so many alternative research hypotheses may explain the results.

marcosdumay8y ago

I would add that 4) Studies with a large p-value but that are not contradicted by others are much more valuable than studies that have a small p-value but all contradict each other.

1 more reply

pfortuny8y ago

Imagine psychology... The end of a science.

Strilanc8y ago

Funny, I was thinking the opposite.

Imagine psychology... done properly. The beginning of a science.

(I realize that "beginning" is too harsh, but psychology does have very serious problems with replicability. At the moment, it deserves its tarnished reputation.)

aimager8y ago

it's too far away

kingkawn8y ago

Or it never was one

s17n8y ago· 5 in thread

It used to be possible to have have a successful academic career without publishing much - for example, one of my philosophy profs in college (at a top 10 school) had never published anything after his dissertation (he got his phd in the early 60s).

Of course, this system only worked because academia was a bastion of the male WASP elites that didn't have much pretense of serving the broader public. But at least you didn't have the torrent of mediocre papers that you see today.

adekok8y ago

> academia was a bastion of the male WASP elites that didn't have much pretense of serving the broader public

Have things really changed? I suspect there are fewer males, but any job that demands 20 years of full-time concerted effort is likely to be dominated by men. Similarly, the western world is overwhelmingly caucasian, so again... the best predictor (now as then) is that white male professors will be represented disproportionately.

> at least you didn't have the torrent of mediocre papers that you see today.

That certainly is true. Stats for the humanities and social sciences are that 80% of the papers have zero citations. i.e. they have no contribution to the greater body of human work.

In Physics (my background), most papers have 2-3 citations, and only a small percentage have 1 or fewer.

I would say that if a discipline is dominated by uncited papers, then that discipline is probably a waste of time. And the professors who work in it are a net drain on society.

tnecniv8y ago

As a note, WASP refers to old money families with ties going back to the colonial era, not just middle-class/wealthy white dudes in America. Also, at least in STEM departments, you will see plenty of non-white names.

> In Physics (my background), most papers have 2-3 citations, and only a small percentage have 1 or fewer

Does that account for self-citations?

1 more reply

nzjrs8y ago

An equally plausible interpretation is that universities have been transformed from teaching institutions into paper factories.

rebuilder8y ago

So... what the academics did wasnt very helpful, but at least they didn't do much of it?

marcosdumay8y ago

Those ones teached. Wether is was helpful or not depend on how good was the teaching and how useful is the knowledge. Not everybody in a teaching institution should be required to push humanity's knowledge forward.

But then you get the problem of selecting those people without easily measured objective indicators. That's why it worked reasonably well when those were slightly low paying jobs restricted to a caste.

leemailll8y ago· 4 in thread

Change p value from 0.05 to 0.005 won't stop p-hacking. And this might also lead to more grunted graduate students as they then will have to increase sample numbers to satisfy new test, which inevitably increase the already painful long time span for projects to get published

taeric8y ago

To be fair, this is a pragmatic, not a technical, solution. Similarly, we limit the speeds we allow in residential areas not because it prevents wreckless driving, but because it decreases the actual risk of it.

Similarly, the technical solution involves technology that does not require drivers and has no risk of human error anymore. The pragmatic solution is to just limit the acceptable speeds.

nkrisc8y ago

A bit of humorous pedantry: We seek to prevent reckless driving. "Wreckless" driving is what we're trying to promote.

1 more reply

kharms8y ago

I've not spent much time in academia, but it was my impression that p-hacking is driven primarily by ignorance rather than willful deceit. If that is indeed the case, it would indeed limit p-hacking as there's usually a finite number of variables being looked at.

Edit: or rather, it would limit false positives that show up as a result of accidental p-value hacking, if not the process itself.

std_throwaway8y ago

Science would benefit from a little less noise.

eelkefolmer8y ago· 3 in thread

Its time to ditch significance levels altogether and use Bayesian inference or analysis.

stewbrew8y ago

I'd rather say it's time to recall what significance levels actually meant to be and in which context they are useful and to ditch the contemporary aberration thereof.

Any game can be gambled. Bayesian statistics just isn't there yet.

analog318y ago

I'm concerned that prior-hacking will become the new p-hacking.

Houshalter8y ago

So you require people report Bayes factors, not posteriors/priors. Those are invariant to the prior.

iovrthoughtthis8y ago· 3 in thread

When can we have scientific papers formatted for the web? Reading pdf's with many tiny columns spread across each page puts me off reading so much.

folli8y ago

Most journals have a HTML and a PDF version (as does Nature): https://www.nature.com/articles/s41562-017-0189-z

I prefer the PDF version for print outs.

iovrthoughtthis8y ago

Thanks you for this. I had no idea. Now I can actually read the paper!

mjpuser8y ago

This is an interesting point considering the World Wide Web was born from the need to share scientific info.

SubiculumCode8y ago· 2 in thread

This is fine, but without other simultaneous changes, will do harm to young scientists. We need credit for publishing null results, or stop judgment on the basis of publication number. Would lead to larger, more well powered studies (good), but this tends to lead to acquiring multiple measures which can be inappropriately data-mined, and leads to large grants to established investigators, but fewer grants to new investigators.

jerrytsai8y ago

Definitely. The main problem is that in the current system no one is being rewarded for good science, but for showing something interesting, bolstered by a declaration of (statistical) significance. The incentives are not aligned with societal objectives.

Good science requires a tension between hypothesis generation and skepticism. Perhaps if we rewarded the _debunking_ of findings as much as we do the discovery of findings, things would change.

adrianratnapala8y ago

Why doesn't this happen already.

The funding bodies etc, who want "quantitive" measures of research look at publications. Why would we expect debunking papers be published if they are debunking something interesting?

2 more replies

logicallee8y ago· 2 in thread

Can someone explain why this three-page article has 72 "authors"? That works out to about as much writing per author as this comment.

arstinOP8y ago

Given the kind of paper this is, I assume the names should be understood as an endorsement. Sorta like signatures on a petition.

Klockan8y ago

Easy, in academia you can be the (co)author of a paper you've never even read.

Houshalter8y ago· 2 in thread

The concept of statistical significance is nonsense. In Bayesian statistics there is only evidence. A p value of 0.05 is roughly equivalent to a factor 20 of evidence. That means you multiply the odds you believe in a hypothesis by 20 (or add 13 decibels.) Similarly a p value of 0.005 is roughly equivalent to 200 units of evidence (23 decibels.)

But whether some amount of evidence is "significant" or not is entirely dependent on your prior. If you believe something has about a 50:50 chance of being true to start with, then a factor 20 of evidence is quite enough. Now you believe it 20:1 likely to be true.

But for something like xkcd's "green jelly beans cause cancer", your prior should be something like 1 to 100,000 or even smaller. After all, there are a lot of possible foods and a lot of possible diseases. Unless you believe a significant number of them are dangerous, your prior for any specific food causing any specific disease must be pretty low. And then even a factor 200 of evidence is nowhere near enough to convince me that green jelly beans cause cancer.

maxerickson8y ago

If it is nonsense you shouldn't be able to translate it coherently into various levels of evidence.

Houshalter8y ago

P values aren't nonsense and correlate with bayesian evidence. I think that interpreting levels of evidence as "significant" or not is nonsense.

rgejman8y ago· 2 in thread

Animal experiments will get A LOT more expensive. Will there be a concomitant increase in agency funding to offset the increased costs?

siginfo8y ago

They do briefly mention "the relative cost of type I versus type II errors". Both errors (Type I - false positive, Type II - false negative) have some cost associated.

Money saved by using a small sample size is wasted trying to replicate a false positive result and by groups around the world that rely on that false result.

Requiring larger sample sizes would mean fewer experiments are carried out but we will have more confidence in the positive results produced. The outcome is fewer experiments wasted on following up on false positives. None of this requires a change in funding.

rgejman8y ago

I really don't think the proposal to do "fewer, but better" experiments work with animal studies. They are so expensive and so complicated and so much work and only answer singular, small questions that you almost always need a ton of further follow up work.

For instance, in the field I work in you have to spend days to months waiting for tumors to grow and then go and treat the animals every day for a couple of weeks with an IV drug (weekends too!). That is a a lot of work and at the end only tells you one piece of information about the drug: does it slow tumor growth in this one experimental model. It may in fact do that -- and you may get a really great p-value if you increase the number of mice -- but you still need to study the drug's pharmacokinetics, tissue distribution, in vivo mechanism of action (assuming you already know the in vitro mechanism of action). These are not just optional experiments that we require today to publish: this kind of work is essential to presenting a story about a new drug. It's not just about what it does, but how it works and universalizable it is.

mnarayan018y ago· 1 in thread

Curious if anyone's done any work to determine if changing the P-value threshold for e.g. Psychology studies (as they call out Psychology in particular) measurably affected replicability with p > 0.005?

zeckalpha8y ago

There's overlap in authorship with this paper: http://www.sciencemag.org/content/349/6251/aac4716

Fomite8y ago

Stop worshipping p-values set to an arbitrary threshold, whether it be 0.05 or 0.005, and start actually critically engaging with the statistics and results themselves.

wavegeek8y ago

> For a wide range of common statistical tests, transitioning from a P value threshold of α = 0.05 to α = 0.005 while maintaining 80% power would require an increase in sample sizes of about 70%.

This seems unintuitive and the claim is unreferenced. Can anyone explain why this is the case (if true)?

cameronraysmith8y ago

I've had some luck showing John Kruschke's Bayesian estimation supersedes the t-test (BEST) and this simple demonstration http://www.sumsar.net/best_online/ to people.

JepZ8y ago

> The choice of any particular threshold is arbitrary [...]

Sounds scientific, doesn't it?

> [...] we judge to be reasonable.

And tomorrow someone else judges it differently?

Maybe they should not try to redefine significance but simply introduce something called 'well-reproducible' or so.

md2248y ago

Just curious: would this have an effect on testing the efficacy of new drugs? I'd hate to see a false negative result for a drug that could actually help people...

j / k navigate · click thread line to collapse

114 comments

79 comments · 19 top-level

gattilorenz8y ago· 9 in thread

It can hardly hurt, but it is still a stop gap measure. It won't solve the publication bias people will still change the hypothesis or the test after measurements are done.

I think the situation would improve with better teaching of philosophy of science and statistics (this would educate better reviewers too).

agentofoblivion8y ago

jjoonathan8y ago

> It doesn't stop p-hacking, it just makes it harder.

I could see the reverse happening, where higher p-value standards lead to normalization of deviance in the form of worse p-hacking.

1 more reply

epistasis8y ago

This is a classic tradeoff between exploration and exploitation in active learning.

If your view of the world is that there are only a very few hypotheses worth exploring, and you have a good lay of the scientific land, then requiring higher bar of proof is probably good.

nonbel8y ago

> "It can hurt, in that it can slow the spread of information."

I am not at all in favor of this proposal, but one thing it may do is stem the tidal wave of misinformation.

btilly8y ago

Yours is a theoretical concern.

It may be that there is a pendulum that needs to swing a few times to get to a good tradeoff. But it is clear, now, which direction it needs to swing.

1 more reply

gboudrias8y ago

> It won't solve the publication bias people will still change the hypothesis or the test after measurements are done.

As a Psychology student, this is a well-known initiative: https://cos.io/prereg/

(Though I can't confirm or deny its widespread usage.)

The publication bias is harder, and pre-registration won't solve this. But I think this is a separate issue, and it's important to address each issue in its own right.

So... Don't hold your breath.

BeetleB8y ago

>I think the situation would improve with better teaching of philosophy of science and statistics (this would educate better reviewers too).

This is necessary, but not sufficient. What's needed is a way to know for sure that the hypothesis was not changed after data collection. I think predeclaring the hypothesis is the way to go.

gattilorenz8y ago

Yeah, but at the end you can still fabricate the data, remove "outliers",... Plus it's almost impossible to imagine a world where, before any experiment in any field, you predeclare it.

1 more reply

setzer228y ago

3 more replies

jimmar8y ago· 8 in thread

moultano8y ago

I'd rather hear about small things that are true than large things that are false.

cropsieboss8y ago

The things is that these small things might just be noise from some confounding factors.

http://jaoa.org/article.aspx?articleid=2517494

Given the size, it's quite clear that USA population has many other confounding factors that cannot be eliminated by mathematics alone (there is no control).

nonbel8y ago

Statistical significance doesn't mean an effect is "true", or "real". So getting rid if it shouldn't cause you any issues.

stewbrew8y ago

The problem here is that you most likely hear about false positives.

amelius8y ago

I don't see the problem as long as you clearly separate significance and effect-size.

seanwilson8y ago

mhermher8y ago

This is why I think better multiple-test-correction methods are more important and would lead better to a desirable outcome than just lowering alpha.

stewbrew8y ago

Multiple-test correction usually implies lowering alpha, doesn't it?

1 more reply

tw10108y ago· 7 in thread

1688y ago

How & why exactly are you unhappy with statistics?

robterrin8y ago

Uh oh. If you don't watch out you'll end up a Bayesian. http://www.stat.columbia.edu/~gelman/research/unpublished/p_...

Is this along the lines of what you were hoping to find? Here's more: http://andrewgelman.com/2016/12/13/bayesian-statistics-whats...

drabiega8y ago

It's likely that the problems with statistics are inherently due to the nature of knowledge, so alternative formulations are not likely to help much.

scryder8y ago

Statistics arises from a set of axioms, assumed truths, which can be used to prove all other things in the field.

You can take a look at the three axioms people use to justify statistics. If you are willing to accept them, all else that relies on them (without using new axioms) must be true:

https://en.wikipedia.org/wiki/Probability_axioms

tw10108y ago

Statistics and probability are different things. I'm fine with the foundations of probability.

1 more reply

BeetleB8y ago

Please don't treat probability and statistics as one.

tnecniv8y ago

> Part of me is still not really happy with the philosophical foundations of statistics.

You mean you aren't happy with...probability?

imh8y ago· 6 in thread

>For a wide range of common statistical tests, transitioning from a P value threshold of α = 0.05 to α = 0.005 while maintaining 80% power would require an increase in sample sizes of about 70%.

This proposal is a great pragmatic step forward. Like they say in the paper, it doesn't solve all problems, but it would be an improvement with reasonable cost and tremendous benefits.

stdbrouw8y ago

maxerickson8y ago

So ultimately the issue is that push button statistics don't work?

1 more reply

reilly30008y ago

The cost of increasing sample size is significant; this is a trade-off that allows smaller projects to still conduct valuable research.

1 more reply

autokad8y ago

almost all problems has to do with data and selection of data. changing the p-value threshold wouldn't help.

coverband8y ago

I upvoted you in principle, but working with a tighter threshold would also make choosing self-serving data samples more obvious, if not more difficult.

scottlocklin8y ago

aheilbut8y ago· 6 in thread

No one in biology would be able to publish anything.

nonbel8y ago

There are so many problems with this:

1) The p-value filter leads to publication bias.

-You should publish your results anyway, or the study wasn't designed/performed correctly. The raw data and description of methods should be valuable.

2) The null hypothesis is (almost) always false anyway.

-Everything in bio/psych/etc has a real (not spurious) non-zero correlation with everything else, so the significance level just determines how much data needs to be collected to reject it.

3) Rejection or not of the null hypothesis does not indicate whether the theory/explanation of interest is correct, so is inappropriate for deciding whether a result is interesting to begin with.

marcosdumay8y ago

I would add that 4) Studies with a large p-value but that are not contradicted by others are much more valuable than studies that have a small p-value but all contradict each other.

1 more reply

pfortuny8y ago

Imagine psychology... The end of a science.

Strilanc8y ago

Funny, I was thinking the opposite.

Imagine psychology... done properly. The beginning of a science.

(I realize that "beginning" is too harsh, but psychology does have very serious problems with replicability. At the moment, it deserves its tarnished reputation.)

aimager8y ago

it's too far away

kingkawn8y ago

Or it never was one

s17n8y ago· 5 in thread

adekok8y ago

> academia was a bastion of the male WASP elites that didn't have much pretense of serving the broader public

> at least you didn't have the torrent of mediocre papers that you see today.

That certainly is true. Stats for the humanities and social sciences are that 80% of the papers have zero citations. i.e. they have no contribution to the greater body of human work.

In Physics (my background), most papers have 2-3 citations, and only a small percentage have 1 or fewer.

I would say that if a discipline is dominated by uncited papers, then that discipline is probably a waste of time. And the professors who work in it are a net drain on society.

tnecniv8y ago

> In Physics (my background), most papers have 2-3 citations, and only a small percentage have 1 or fewer

Does that account for self-citations?

1 more reply

nzjrs8y ago

An equally plausible interpretation is that universities have been transformed from teaching institutions into paper factories.

rebuilder8y ago

So... what the academics did wasnt very helpful, but at least they didn't do much of it?

marcosdumay8y ago

leemailll8y ago· 4 in thread

taeric8y ago

Similarly, the technical solution involves technology that does not require drivers and has no risk of human error anymore. The pragmatic solution is to just limit the acceptable speeds.

nkrisc8y ago

A bit of humorous pedantry: We seek to prevent reckless driving. "Wreckless" driving is what we're trying to promote.

1 more reply

kharms8y ago

Edit: or rather, it would limit false positives that show up as a result of accidental p-value hacking, if not the process itself.

std_throwaway8y ago

Science would benefit from a little less noise.

eelkefolmer8y ago· 3 in thread

Its time to ditch significance levels altogether and use Bayesian inference or analysis.

stewbrew8y ago

I'd rather say it's time to recall what significance levels actually meant to be and in which context they are useful and to ditch the contemporary aberration thereof.

Any game can be gambled. Bayesian statistics just isn't there yet.

analog318y ago

I'm concerned that prior-hacking will become the new p-hacking.

Houshalter8y ago

So you require people report Bayes factors, not posteriors/priors. Those are invariant to the prior.

iovrthoughtthis8y ago· 3 in thread

When can we have scientific papers formatted for the web? Reading pdf's with many tiny columns spread across each page puts me off reading so much.

folli8y ago

Most journals have a HTML and a PDF version (as does Nature): https://www.nature.com/articles/s41562-017-0189-z

I prefer the PDF version for print outs.

iovrthoughtthis8y ago

Thanks you for this. I had no idea. Now I can actually read the paper!

mjpuser8y ago

This is an interesting point considering the World Wide Web was born from the need to share scientific info.

SubiculumCode8y ago· 2 in thread

jerrytsai8y ago

Good science requires a tension between hypothesis generation and skepticism. Perhaps if we rewarded the _debunking_ of findings as much as we do the discovery of findings, things would change.

adrianratnapala8y ago

Why doesn't this happen already.

The funding bodies etc, who want "quantitive" measures of research look at publications. Why would we expect debunking papers be published if they are debunking something interesting?

2 more replies

logicallee8y ago· 2 in thread

Can someone explain why this three-page article has 72 "authors"? That works out to about as much writing per author as this comment.

arstinOP8y ago

Given the kind of paper this is, I assume the names should be understood as an endorsement. Sorta like signatures on a petition.

Klockan8y ago

Easy, in academia you can be the (co)author of a paper you've never even read.

Houshalter8y ago· 2 in thread

maxerickson8y ago

If it is nonsense you shouldn't be able to translate it coherently into various levels of evidence.

Houshalter8y ago

P values aren't nonsense and correlate with bayesian evidence. I think that interpreting levels of evidence as "significant" or not is nonsense.

rgejman8y ago· 2 in thread

Animal experiments will get A LOT more expensive. Will there be a concomitant increase in agency funding to offset the increased costs?

siginfo8y ago

They do briefly mention "the relative cost of type I versus type II errors". Both errors (Type I - false positive, Type II - false negative) have some cost associated.

Money saved by using a small sample size is wasted trying to replicate a false positive result and by groups around the world that rely on that false result.

rgejman8y ago

mnarayan018y ago· 1 in thread

zeckalpha8y ago

There's overlap in authorship with this paper: http://www.sciencemag.org/content/349/6251/aac4716

Fomite8y ago

Stop worshipping p-values set to an arbitrary threshold, whether it be 0.05 or 0.005, and start actually critically engaging with the statistics and results themselves.

wavegeek8y ago

> For a wide range of common statistical tests, transitioning from a P value threshold of α = 0.05 to α = 0.005 while maintaining 80% power would require an increase in sample sizes of about 70%.

This seems unintuitive and the claim is unreferenced. Can anyone explain why this is the case (if true)?

cameronraysmith8y ago

I've had some luck showing John Kruschke's Bayesian estimation supersedes the t-test (BEST) and this simple demonstration http://www.sumsar.net/best_online/ to people.

JepZ8y ago

> The choice of any particular threshold is arbitrary [...]

Sounds scientific, doesn't it?

> [...] we judge to be reasonable.

And tomorrow someone else judges it differently?

Maybe they should not try to redefine significance but simply introduce something called 'well-reproducible' or so.

md2248y ago

Just curious: would this have an effect on testing the efficacy of new drugs? I'd hate to see a false negative result for a drug that could actually help people...

j / k navigate · click thread line to collapse