Abandoning statistical significance is both sensible and practical

Valentin Amrhein; Andrew Gelman; Sander Greenland; Blakeley B McShane

doi:10.7287/peerj.preprints.27657v1

Abandoning statistical significance is both sensible and practical

Valentin Amrhein ¹, Andrew Gelman², Sander Greenland³, Blakeley B McShane⁴

1 Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland

2 Department of Political Science and Department of Statistics, Columbia University, New York, United States

3 Department of Epidemiology and Department of Statistics, University of California, Los Angeles, United States

4 Kellogg School of Management, Northwestern University, Evanston, United States

DOI: 10.7287/peerj.preprints.27657v1

Published: 2019-04-16
Accepted: 2019-04-16

Subject Areas: Science Policy, Statistics
Keywords: Selective reporting, Hypothesis test, Replication, Significance test, Confidence interval, Unreplicable research, P-value, Effect size inflation, P-hacking, Publication bias

Copyright: © 2019 Amrhein et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.

Cite this article: Amrhein V, Gelman A, Greenland S, McShane BB. 2019. Abandoning statistical significance is both sensible and practical. PeerJ Preprints 7:e27657v1 https://doi.org/10.7287/peerj.preprints.27657v1

Abstract

To the Editor of JAMA

Dr Ioannidis writes against our proposals to abandon statistical significance in scientific reasoning and publication, as endorsed in the editorial of a recent special issue of an American Statistical Association journal devoted to moving to a “post p<0.05 world.” We appreciate that he echoes our calls for “embracing uncertainty, avoiding hyped claims…and recognizing ‘statistical significance’ is often poorly understood.” We also welcome his agreement that the “interpretation of any result is far more complicated than just significance testing” and that “clinical, monetary, and other considerations may often have more importance than statistical findings.”

Nonetheless, we disagree that a statistical significance-based “filtering process is useful to avoid drowning in noise” in science and instead view such filtering as harmful. First, the implicit rule to not publish nonsignificant results biases the literature with overestimated effect sizes and encourages “hacking” to get significance. Second, nonsignificant results are often wrongly treated as zero. Third, significant results are often wrongly treated as truth rather than as the noisy estimates they are, thereby creating unrealistic expectations of replicability. Fourth, filtering on statistical significance provides no guarantee against noise. Instead, it amplifies noise because the quantity on which the filtering is based (the p-value) is itself extremely noisy and is made more so by dichotomizing it.

We also disagree that abandoning statistical significance will reduce science to “a state of statistical anarchy.” Indeed, the journal Epidemiology banned statistical significance in 1990 and is today recognized as a leader in the field.

Valid synthesis requires accounting for all relevant evidence—not just the subset that attained statistical significance. Thus, researchers should report more, not less, providing estimates and uncertainty statements for all quantities, justifying any exceptions, and considering ways the results are wrong. Publication criteria should be based on evaluating study design, data quality, and scientific content—not statistical significance.

Decisions are seldom necessary in scientific reporting. However, when they are required (as in clinical practice), they should be made based on the costs, benefits, and likelihoods of all possible outcomes, not via arbitrary cutoffs applied to statistical summaries such as p-values which capture little of this picture.

The replication crisis in science is not the product of the publication of unreliable findings. The publication of unreliable findings is unavoidable: as the saying goes, if we knew what we were doing, it would not be called research. Rather, the replication crisis has arisen because unreliable findings are presented as reliable.

Author Comment

This manuscript has been submitted as a “Letter to the Editor” to the journal JAMA. It is a reply to a “Viewpoint” by John Ioannidis in JAMA, https://jamanetwork.com/journals/jama/fullarticle/2730486, which was a reply to “Retire statistical significance” in Nature, https://www.nature.com/articles/d41586-019-00857-9, and to articles published in a special issue of The American Statistician, https://www.tandfonline.com/toc/utas20/73/sup1.

0

Add your feedback

Top referrals unique visitors

Share this preprint

Metrics

Download article