**Aug 12, 2021 07:23 PM**

**C C**
https://www.sciencenews.org/article/stat...is-origins

INTRO: In the middle of the 20th century, the field of psychology had a problem. In the wake of the Manhattan Project and in the early days of the space race, the so-called “hard sciences” were producing tangible, highly publicized results. Psychologists and other social scientists looked on enviously. Their results were squishy, and difficult to quantify.

Psychologists in particular wanted a statistical skeleton key to unlock true experimental insights. It was an unrealistic burden to place on statistics, but the longing for a mathematical seal of approval burned hot. So psychology textbook writers and publishers created one, and called it statistical significance.

By calculating just one number from their experimental results, called a P value, researchers could now deem those results “statistically significant.” That was all it took to claim — even if mistakenly — that an interesting and powerful effect had been demonstrated. The idea took off, and soon legions of researchers were reporting statistically significant results.

To make matters worse, psychology journals began to publish papers only if they reported statistically significant findings, prompting a surprisingly large number of investigators to massage their data — either by gaming the system or cheating — to get below the P value of 0.05 that granted that status. Inevitably, bogus findings and chance associations began to proliferate.

As editor of a journal called Memory & Cognition from 1993 to 1997, Geoffrey Loftus of the University of Washington tried valiantly to yank psychologists out of their statistical rut. At the start of his tenure, Loftus published an editorial telling researchers to stop mindlessly calculating whether experimental results are statistically significant or not (SN: 5/16/13). That common practice impeded scientific progress, he warned.

Keep it simple, Loftus advised. Remember that a picture is worth a thousand reckonings of statistical significance. In that spirit, he recommended reporting straightforward averages to compare groups of volunteers in a psychology experiment. Graphs could show whether individuals’ scores covered a broad range or clumped around the average, enabling a calculation of whether the average score would likelychange a little or a lot in a repeat study. In this way, researchers could evaluate, say, whether volunteers scored better on a difficult math test if first allowed to write about their thoughts and feelings for 10 minutes, versus sitting quietly for 10 minutes.

Loftus might as well have tried to lasso a runaway train. Most researchers kept right on touting the statistical significance of their results.

“Significance testing is all about how the world isn’t and says nothing about how the world is,” Loftus later said when looking back on his attempt to change how psychologists do research.

What’s remarkable is not only that mid-20th century psychology textbook writers and publishers fabricated significance testing out of a mishmash of conflicting statistical techniques (SN: 6/7/97). It’s also that their weird creation was embraced by many other disciplines over the next few decades. It didn’t matter that eminent statisticians and psychologists panned significance testing from the start. The concocted calculation proved highly popular in social sciences, biomedical and epidemiological research, neuroscience and biological anthropology.

A human hunger for certainty fueled that academic movement... (MORE)

INTRO: In the middle of the 20th century, the field of psychology had a problem. In the wake of the Manhattan Project and in the early days of the space race, the so-called “hard sciences” were producing tangible, highly publicized results. Psychologists and other social scientists looked on enviously. Their results were squishy, and difficult to quantify.

Psychologists in particular wanted a statistical skeleton key to unlock true experimental insights. It was an unrealistic burden to place on statistics, but the longing for a mathematical seal of approval burned hot. So psychology textbook writers and publishers created one, and called it statistical significance.

By calculating just one number from their experimental results, called a P value, researchers could now deem those results “statistically significant.” That was all it took to claim — even if mistakenly — that an interesting and powerful effect had been demonstrated. The idea took off, and soon legions of researchers were reporting statistically significant results.

To make matters worse, psychology journals began to publish papers only if they reported statistically significant findings, prompting a surprisingly large number of investigators to massage their data — either by gaming the system or cheating — to get below the P value of 0.05 that granted that status. Inevitably, bogus findings and chance associations began to proliferate.

As editor of a journal called Memory & Cognition from 1993 to 1997, Geoffrey Loftus of the University of Washington tried valiantly to yank psychologists out of their statistical rut. At the start of his tenure, Loftus published an editorial telling researchers to stop mindlessly calculating whether experimental results are statistically significant or not (SN: 5/16/13). That common practice impeded scientific progress, he warned.

Keep it simple, Loftus advised. Remember that a picture is worth a thousand reckonings of statistical significance. In that spirit, he recommended reporting straightforward averages to compare groups of volunteers in a psychology experiment. Graphs could show whether individuals’ scores covered a broad range or clumped around the average, enabling a calculation of whether the average score would likelychange a little or a lot in a repeat study. In this way, researchers could evaluate, say, whether volunteers scored better on a difficult math test if first allowed to write about their thoughts and feelings for 10 minutes, versus sitting quietly for 10 minutes.

Loftus might as well have tried to lasso a runaway train. Most researchers kept right on touting the statistical significance of their results.

“Significance testing is all about how the world isn’t and says nothing about how the world is,” Loftus later said when looking back on his attempt to change how psychologists do research.

What’s remarkable is not only that mid-20th century psychology textbook writers and publishers fabricated significance testing out of a mishmash of conflicting statistical techniques (SN: 6/7/97). It’s also that their weird creation was embraced by many other disciplines over the next few decades. It didn’t matter that eminent statisticians and psychologists panned significance testing from the start. The concocted calculation proved highly popular in social sciences, biomedical and epidemiological research, neuroscience and biological anthropology.

A human hunger for certainty fueled that academic movement... (MORE)