Statistical significance - Null hypothesis - The P value: What it really means

'Statistical Significance' Is Overused And Often Misleading : . . . In the early 20th century, the father of statistics, R.A. Fisher, developed a test of significance. It involves a variable called the p-value, that he intended to be a guide for judging results. Over the years, scientists have warped that idea beyond all recognition. They've created an arbitrary threshold for the p-value, typically 0.05, and they use that to declare whether a scientific result is significant or not.This shortcut often determines whether studies get published or not, whether scientists get promoted and who gets grant funding. "It's really gotten stretched all out of proportion," says Ron Wasserstein, the executive director of the American Statistical Association. He's been advocating this change for years and he's not alone. "Failure to make these changes are really now starting to have a sustained negative impact on how science is conducted," he says. "It's time to start making the changes. It's time to move on."

There are many downsides to this, he says. One is that scientists have been known to massage their data to make their results hit this magic threshold. Arguably worse, scientists often find that they can't publish their interesting (if somewhat ambiguous) results if they aren't statistically significant. But that information is actually still useful, and advocates say it's wasteful simply to throw it away. There are some prominent voices in the world of statistics who reject the call to abolish the term "statistical significance." (MORE)

- - -

800 scientists say it’s time to abandon “statistical significance”: . . . “Statistical significance” is too often misunderstood — and misused. That’s why a trio of scientists writing in Nature this week are calling “for the entire concept of statistical significance to be abandoned.” [...] More than 800 other scientists and statisticians across the world have signed on to this manifesto.

[...] we know from recent years that science is rife with false-positive studies that achieved values of less than .05 (read my explainer on the replication crisis in social science for more). The Nature commentary authors argue that the math is not the problem. Instead, it’s human psychology. Bucketing results into “statistically significant” and “statistically non-significant,” they write, leads to a too black-and-white approach to scrutinizing science.

[...] Many scientists recognize there are most robust ways to evaluate a scientific finding. And they already engage in them. But they, somehow, don’t currently hold as much power as “statistical significance.” [...] The authors of the latest Nature commentary aren’t calling for the end of p-values. They’d still like scientists to report them where appropriate, but not necessarily label them “significant” or not.

There’s likely to be argument around this strategy. Some might think it’s useful to have simple rules of thumb, or thresholds, to evaluate science. And we still need to have phrases in our language to describe scientific results. Erasing “statistical significance” might just confuse things.

In any case, changing the definition of statistical significance, or nixing it entirely, doesn’t address the real problem. [...] The biggest problem in science isn’t statistical significance; it’s the culture. ... young scientists need publications to get jobs. Under the status quo, in order to get publications, you need statistically significant results. Statistical significance alone didn’t lead to the replication crisis. The institutions of science incentivized the behaviors that allowed it to fester. (MORE - details)