Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5

The flawed reasoning behind the Replication Crisis in science

#1
C C Offline
http://nautil.us/issue/74/networks/the-f...ion-crisis

EXCERPT (Aubrey Clayton): Here are three versions of the same story:

1. ... Sally Clark ... was arrested and charged with two counts of murder. The pediatrician Roy Meadow, inventor of the term “Munchausen Syndrome by Proxy,” testified at the trial that it was extremely unlikely that two children from an affluent family like the Clarks would die from Sudden Infant Death Syndrome (SIDS) or “cot death.” He estimated the odds were 1 in 73 million, which he colorfully compared to an 80:1 longshot winning the Grand National horse race four years in a row. Clark was convicted and sentenced to life in prison. The press reviled her as a child murderer...

2. Suppose an otherwise healthy woman in her forties notices a suspicious lump in her breast and goes in for a mammogram. The report comes back that the lump is malignant. She wants to know the chance of the diagnosis being wrong. Her doctor answers that, as diagnostic tools go, these scans are very accurate. Such a scan would find nearly 100 percent of true cancers and would only misidentify a benign lump as cancer about 5 percent of the time. Therefore, the probability of this being a false positive is very low, about 1 in 20.

3. In 2012, Professor Ara Norenzayan at the University of British Columbia claimed to have evidence that looking at an image of Rodin’s sculpture “The Thinker” could make people less religious. [...Analytic thinking can decrease religious belief, study shows...]

All three of these vignettes involve the same error in reasoning with probabilities. The first two are examples of well-known fallacies, called, respectively, the Prosecutor’s Fallacy and the Base Rate Fallacy. The third is a typical statistical analysis of a scientific study, of the kind you can find in most any reputable journal today. In fact, Norenzayan’s results were published in Science and have to date been cited some 424 times in research literature. Atheists hailed it as scientific proof that religion was irrational; religious people were understandably offended at the suggestion that the source of their faith was a lack of reasoning ability.

The failure in reasoning at the heart of the three examples points to why so many results, in fields from astronomy to zoology, cannot be replicated, a big problem that the world of science is currently turning itself inside out trying to deal with.

The mathematical lens that allows us to see the flaw in these arguments is Bayes’ theorem. The theorem dictates that the probability we assign to a theory (Sally Clark is guilty, a patient has cancer, college students become less theistic when they stare at Rodin), in light of some observation, is proportional both to the conditional probability of the observation assuming the theory is true, and to the prior probability we gave the theory before making the observation. When two theories compete, one may make the observation much more probable, that is, produce a higher conditional probability. But according to Bayes’ rule, we might still consider that explanation unlikely if we gave it a low probability of being true from the start.

So, the missing ingredient in all three examples is the prior probability for the various hypotheses. In the case of Sally Clark, the prosecution’s theory was she had murdered her children, itself an extremely rare event. Suppose, for argument’s sake, by tallying up historical murder records, we arrived at prior odds of 100 million to 1 for any particular mother like her to commit double infanticide. That would have balanced the extreme unlikelihood of the observation (two infants dying) under the alternative hypothesis that they were well cared for. [...see article for details...] We’d conclude, based on these priors and no additional evidence aside from the children’s deaths, that it was actually about 58 percent likely Clark was innocent.

For the breast cancer example, the doctor would need to consider the overall incidence rate of cancer among similar women with similar symptoms, not including the result of the mammogram. Maybe a physician would say from experience that about 99 percent of the time a similar patient finds a lump it turns out to be benign. So the low prior chance of a malignant tumor would balance the low chance of getting a false positive scan result. [...see article...] We’d find there was about an 83 percent chance the patient doesn’t have cancer.

Regarding the study of sculpture and religious sentiment, we need to assess the likelihood, before considering the data, that a brief encounter with art could have such an effect. Past experience should make us pretty skeptical, especially given the size of the claimed effect [...] Maybe we’re not so dogmatic as to rule out “The Thinker” hypothesis altogether, but a prior probability of 1 in 1,000, somewhere between the chance of being dealt a full house and four-of-a-kind in a poker hand, could be around the right order of magnitude. [...see article...] We’d end up saying the probability for “The Thinker”-atheism effect based on this experiment was 0.012, or about 1 in 83, a mildly interesting blip but almost certainly not worth publishing.

The problem, though, is the dominant mode of statistical analysis these days isn’t Bayesian. Since the 1920s, the standard approach to judging scientific theories has been significance testing, made popular by the statistician Ronald Fisher. Fisher’s methods and their latter-day spinoffs are now the lingua franca of scientific data analysis...

[...] Significance testing has been criticized along these lines for about as long as it’s been around. [...see article for details...] Thanks mostly to Fisher’s influence, these arguments have historically failed to win many converts to Bayesianism. But practical experience may now be starting to do what theory could not. [...]

But the [rul=https://en.wikipedia.org/wiki/Replication_crisis]replication crisis[/url] won’t stop there. Similar projects have shown the same problem in fields from economics to neuroscience to cancer biology. [...see article...] We Bayesians have seen this coming for years. In 2005, John Ioannidis, now a professor at Stanford Medical School and the Department of Statistics, wrote an article titled “Why most published research findings are false.“ He showed in a straightforward Bayesian argument that if a theory, such as an association between a gene and a disease, had a low prior probability, then even after passing a test for statistical significance it could still have a low probability of being true....

Now, a consensus is finally beginning to emerge: Something is wrong with science that’s causing established results to fail. One proposed and long overdue remedy has been an overhaul of the use of statistics. In 2015, the journal Basic and Applied Social Psychology took the drastic measure of banning the use of significance testing in all its submissions, and this March, an editorial in Nature co-signed by more than 800 authors argued for abolishing the use of statistical significance altogether. Similar proposals have been tried in the past, but every time the resistance has been beaten back and significance testing has remained the standard. Maybe this time the fear of having a career’s worth of results exposed as irreproducible will provide scientists with the extra motivation they need.

The main reason scientists have historically been resistant to using Bayesian inference instead is that they are afraid of being accused of subjectivity. The prior probabilities required for Bayes’ rule feel like an unseemly breach of scientific ethics. Where do these priors come from? How can we allow personal judgment to pollute our scientific inferences, instead of letting the data speak for itself? But consider the supposedly “objective” probabilities in the Clark case.... (MORE - details)
Reply
#2
Yazata Online
While base-rate fallacies may be a contributing factor along with bad statistics in general, I don't think that we can attribute the replication crisis to that alone.

There are lots of things in play, such as wishful thinking and preexisting biases, pressure to publish, desire to publish results that are perceived as important and new, felt need to please funding sources, and many more.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Article Flawed body of studies shows true ‘long COVID’ risk likely exaggerated (data bias) C C 0 357 Sep 26, 2023 02:23 AM
Last Post: C C
  Climate change drives change in lifestyles? + Obesity sci's paradigm fatally flawed? C C 1 116 Sep 14, 2021 11:30 PM
Last Post: Syne
  Experiment replication doesn't always equate to truth + Big data & breast cancer C C 0 300 May 15, 2019 02:25 AM
Last Post: C C
  Is science really facing a reproducibility crisis, and do we need it to? C C 0 295 Mar 19, 2018 05:19 PM
Last Post: C C



Users browsing this thread: 1 Guest(s)