People don’t make better decisions when given more data, so why assume A.I. would?

**C C** · (This post was last modified: May 27, 2021 03:57 AM by C C.)

https://onezero.medium.com/a-i-is-solvin...3b636770cd

EXCERPTS: . . . For as long as the Department of Defense (DOD) has collected data, it has spent billions of dollars attempting to “clean” it. The dream of a master brain in which streams of clean and accurate data flow in and produce insight and greater situational awareness for governments and armies is as old as computers themselves — France tried it, Chile tried it, the Soviet Union tried it three times, undoubtedly China is trying it right now — but no matter how much data we gather, how fast or powerful the machines get, it always seems just out of reach.

Experts will point out that data scientists spend roughly 80% of their time cleaning data [...] After decades of investment, oversight, and standards development, we are not closer to total situational awareness through a computerized brain than we were in the 1970s. As the computers have gotten more advanced, the amount of data they are drowning in has increased too.

And it’s not just the DOD’s money that has failed to solve the problem. [...] Despite billions invested in A.I. moderation, the largest social media companies still rely heavily on armies of human beings to scrub the most horrific content off their platforms. It may not be a surprise that Big Government can’t get a good return on investment from A.I., but it seems Big Tech can’t either.

When attempting to engineer a solution to a hard problem, it’s worthwhile to strip things down to first principles: What assumptions are we making, and how do those assumptions frame what problems we think we need to solve? If those assumptions were different, would we be solving different problems? How do the problems we want to solve map to outcomes we value?

The outcome we’re all hoping for from A.I. is better decision-making. [...] There’s no mystery as to why the DOD would want to prioritize technology that will allow it to prevent conflict or minimize collateral damage. There’s no confusion as to why Facebook wants to control hate speech on its platform.

But the research that has been done by scientists [...] calls the value of knowing more into question. ... That seems unbelievable: Perfect information should automatically improve the decision-making process. But it doesn’t because more information rarely changes the organizational politics behind a decision.

A.I. can correctly identify the content, but the decisions made based on that content are heavily informed by the norms and expectations of both the users and the organization. Facebook’s moderation policies, for example, allow images of anuses to be photoshopped on celebrities but not a pic of the celebrity’s actual anus. It’s easy for human beings to understand how the relationships between stakeholders make that distinction sensible: One violates the norms around free speech and public commentary; the other does not.

As long as decisions need to be made in teams, accounting for various stakeholders and their incentives, the best path to improving decision-making isn’t simply adding more sensors to get more data. You need to improve communication between stakeholders.

This begs the question: Do we need to invest billions of dollars cleaning data and sharpening our sensors in order to see benefits from A.I.?

The way we talk about data quality is misleading. We speak of “clean” data as if there is one state where data is both accurate (and bias-free) and reusable. Clean is not the same thing as accurate, and accurate is not the same thing as actionable. Problems on any one of these vectors could impede an A.I. model’s development or interfere with the quality of its results... (MORE - details)

**C C** · May 29, 2021 03:23 AM

Hundreds of gibberish papers still lurk in the scientific literature
https://www.nature.com/articles/d41586-021-01436-7

INTRO: Nonsensical research papers generated by a computer program are still popping up in the scientific literature many years after the problem was first seen, a study has revealed1. Some publishers have told Nature they will take down the papers, which could result in more than 200 retractions.

The issue began in 2005, when three PhD students created paper-generating software called SCIgen for “maximum amusement”, and to show that some conferences would accept meaningless papers. The program cobbles together words to generate research articles with random titles, text and charts, easily spotted as gibberish by a human reader. It is free to download, and anyone can use it.

By 2012, computer scientist Cyril Labbé had found 85 fake SCIgen papers in conferences published by the Institute of Electrical and Electronic Engineers (IEEE); he went on to find more than 120 fake SCIgen papers published by the IEEE and by Springer2. It was unclear who had generated the papers or why. The articles were subsequently retracted — or sometimes deleted — and Labbé released a website allowing anyone to upload a manuscript and check whether it seems to be a SCIgen invention. Springer also sponsored a PhD project to help spot SCIgen papers, which resulted in free software called SciDetect... (MORE)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Research More rural, minoritized people get amputations – AI gets closer to why	C C	1	533	Jul 9, 2025 07:18 PM Last Post: Syne
	How artificial intelligence can explain its decisions	C C	0	449	Sep 3, 2022 10:37 PM Last Post: C C
	Breakthrough metamaterial makes its own decisions	C C	0	391	Jan 15, 2022 02:51 AM Last Post: C C
	Peering in an AI's brain to help trust its decisions + Transistor behaves like neuron	C C	0	756	Jul 3, 2017 06:17 PM Last Post: C C
	Information, decisions and bits	C C	0	799	May 6, 2015 03:22 PM Last Post: C C