Should we abandon p-values?
Friendly warning – technical post alert. This post is about my time this past week at the American Statistical Association’s symposium to discuss our p-value-heavy culture. Why do we put p-values on a pedestal?
As you may (or may not) recall from your statistics class, the p-value is the probability of seeing your observed results when your null hypothesis is true, given the data that you have, and other various caveats, if you’re participating in null hypothesis statistical testing (“NHST”).
Ok, I realize you may have quit reading right there and started looking for puppy videos. But this next part may sound familiar – we (er, by “we” I mean statisticians..) have conditioned industry and decision-makers and physicians and physicists and journalists and many other smart people to know that when p is less than 0.05, a significant finding has emerged from one’s data.
How cute and tidy! Such a clear rule, in a gray, muddy world, that non-statisticians can keep in their back pocket.
The ASA released a statement last year on p-values that essentially admits that we all follow the “is p < 0.05” rule because that’s what we were taught, and we teach it because that’s what we follow.
Here’s why we should care about this now. We are careening further down a big data path, at breakneck speeds, and the “is p < 0.05” rule means diddly in big-data-land. Yes, you heard me – as sample sizes grow (that would be the “big” in big data), p-values almost always are small, and therefore meaningless. There’s math behind this, but that’s beyond the scope of this post.
What’s more, p-values have become a sort of publication currency for scientists, where the “right” p-value will get an article published in a journal and the “wrong,” or insignificant, p-value will end up in the rejected bin. I’m generalizing here, but this has happened.
Thankfully, there are many more signals our statistical tests can send us besides a p-value (like effect size, confidence intervals, effect sign, and the result in context with other studies). Or we can ignore the NHST approach altogether and apply Bayesian methods. Or how about just simply reporting the results (parameter estimates, standard errors, etc.)?
Even after thinking about this for days, I am still on the fence – dropping this easy decision rule and well-known threshold will be difficult to implement beyond the statistical community. Demoting the p-value from its pride of place as these authors describe puts us in an uneasy area. Laura Lazzeroni of Stanford brilliantly reminded symposium participants of our known unknowns and our unknown unknowns, as Donald Rumsfeld would say:
There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know. – Donald Rumsfeld
Even if industry can’t easily implement the absence of a decision rule, simple awareness beyond the p-value would be advancement for the scientific process.
To paraphrase the ASA, good statistics make good science.