How the strange idea of ‘statistical significance’ was born

C C Offline

INTRO: In the middle of the 20th century, the field of psychology had a problem. In the wake of the Manhattan Project and in the early days of the space race, the so-called “hard sciences” were producing tangible, highly publicized results. Psychologists and other social scientists looked on enviously. Their results were squishy, and difficult to quantify.

Psychologists in particular wanted a statistical skeleton key to unlock true experimental insights. It was an unrealistic burden to place on statistics, but the longing for a mathematical seal of approval burned hot. So psychology textbook writers and publishers created one, and called it statistical significance.

By calculating just one number from their experimental results, called a P value, researchers could now deem those results “statistically significant.” That was all it took to claim — even if mistakenly — that an interesting and powerful effect had been demonstrated. The idea took off, and soon legions of researchers were reporting statistically significant results.

To make matters worse, psychology journals began to publish papers only if they reported statistically significant findings, prompting a surprisingly large number of investigators to massage their data — either by gaming the system or cheating — to get below the P value of 0.05 that granted that status. Inevitably, bogus findings and chance associations began to proliferate.

As editor of a journal called Memory & Cognition from 1993 to 1997, Geoffrey Loftus of the University of Washington tried valiantly to yank psychologists out of their statistical rut. At the start of his tenure, Loftus published an editorial telling researchers to stop mindlessly calculating whether experimental results are statistically significant or not (SN: 5/16/13). That common practice impeded scientific progress, he warned.

Keep it simple, Loftus advised. Remember that a picture is worth a thousand reckonings of statistical significance. In that spirit, he recommended reporting straightforward averages to compare groups of volunteers in a psychology experiment. Graphs could show whether individuals’ scores covered a broad range or clumped around the average, enabling a calculation of whether the average score would likelychange a little or a lot in a repeat study. In this way, researchers could evaluate, say, whether volunteers scored better on a difficult math test if first allowed to write about their thoughts and feelings for 10 minutes, versus sitting quietly for 10 minutes.

Loftus might as well have tried to lasso a runaway train. Most researchers kept right on touting the statistical significance of their results.

“Significance testing is all about how the world isn’t and says nothing about how the world is,” Loftus later said when looking back on his attempt to change how psychologists do research.

What’s remarkable is not only that mid-20th century psychology textbook writers and publishers fabricated significance testing out of a mishmash of conflicting statistical techniques (SN: 6/7/97). It’s also that their weird creation was embraced by many other disciplines over the next few decades. It didn’t matter that eminent statisticians and psychologists panned significance testing from the start. The concocted calculation proved highly popular in social sciences, biomedical and epidemiological research, neuroscience and biological anthropology.

A human hunger for certainty fueled that academic movement... (MORE)
Syne Offline

Significance Tests in Climate Science

A large fraction of papers in the climate literature includes erroneous uses of significance tests. A Bayesian analysis is presented to highlight the meaning of significance tests and why typical misuse occurs. The significance statistic is not a quantitative measure of how confident one can be of the “reality” of a given result. It is concluded that a significance test very rarely provides useful quantitative information.

Example: Statistical significance of seasonalwarming/cooling trends

Possibly Related Threads…
Thread Author Replies Views Last Post
  Statistical technique hints at hundreds of lost medieval legends C C 0 21 Feb 21, 2022 11:42 PM
Last Post: C C
  Racism is a framework, not a theory (statistical modeling) C C 1 33 Jan 8, 2022 02:49 AM
Last Post: Syne
  Fertility rate: 'Jaw-dropping' global crash in children being born C C 0 103 Jul 16, 2020 08:12 PM
Last Post: C C
  We’re very close to finding a solar system like our own (statistical data analysis) C C 0 160 Jun 7, 2020 04:51 PM
Last Post: C C
  Statistical analysis reveals odds of life evolving on alien worlds C C 0 77 May 21, 2020 04:49 AM
Last Post: C C
  How a statistical paradox helps to get to the root of bias in college admissions C C 0 261 Dec 20, 2017 06:55 PM
Last Post: C C
  Statistical fallacies & paradoxes + The real risk of rare, dangerous events C C 0 224 May 24, 2017 04:47 AM
Last Post: C C
  "Cancer Research Is Broken" - Statistical Modeling, Causal Inference, Social Science C C 0 472 Apr 23, 2016 05:48 AM
Last Post: C C
  Ignorance Paladin: Have Statistical Facts, Will Travel C C 2 681 Nov 4, 2015 06:41 PM
Last Post: Yazata
  The “Big Five” Misinterpretations of Statistical Significance C C 0 421 Oct 9, 2015 06:02 PM
Last Post: C C

Users browsing this thread: 1 Guest(s)