https://onezero.medium.com/a-i-is-solvin...3b636770cd
EXCERPTS: . . . For as long as the Department of Defense (DOD) has collected data, it has spent billions of dollars attempting to “clean” it. The dream of a master brain in which streams of clean and accurate data flow in and produce insight and greater situational awareness for governments and armies is as old as computers themselves — France tried it, Chile tried it, the Soviet Union tried it three times, undoubtedly China is trying it right now — but no matter how much data we gather, how fast or powerful the machines get, it always seems just out of reach.
Experts will point out that data scientists spend roughly 80% of their time cleaning data [...] After decades of investment, oversight, and standards development, we are not closer to total situational awareness through a computerized brain than we were in the 1970s. As the computers have gotten more advanced, the amount of data they are drowning in has increased too.
And it’s not just the DOD’s money that has failed to solve the problem. [...] Despite billions invested in A.I. moderation, the largest social media companies still rely heavily on armies of human beings to scrub the most horrific content off their platforms. It may not be a surprise that Big Government can’t get a good return on investment from A.I., but it seems Big Tech can’t either.
When attempting to engineer a solution to a hard problem, it’s worthwhile to strip things down to first principles: What assumptions are we making, and how do those assumptions frame what problems we think we need to solve? If those assumptions were different, would we be solving different problems? How do the problems we want to solve map to outcomes we value?
The outcome we’re all hoping for from A.I. is better decision-making. [...] There’s no mystery as to why the DOD would want to prioritize technology that will allow it to prevent conflict or minimize collateral damage. There’s no confusion as to why Facebook wants to control hate speech on its platform.
But the research that has been done by scientists [...] calls the value of knowing more into question. ... That seems unbelievable: Perfect information should automatically improve the decision-making process. But it doesn’t because more information rarely changes the organizational politics behind a decision.
A.I. can correctly identify the content, but the decisions made based on that content are heavily informed by the norms and expectations of both the users and the organization. Facebook’s moderation policies, for example, allow images of anuses to be photoshopped on celebrities but not a pic of the celebrity’s actual anus. It’s easy for human beings to understand how the relationships between stakeholders make that distinction sensible: One violates the norms around free speech and public commentary; the other does not.
As long as decisions need to be made in teams, accounting for various stakeholders and their incentives, the best path to improving decision-making isn’t simply adding more sensors to get more data. You need to improve communication between stakeholders.
This begs the question: Do we need to invest billions of dollars cleaning data and sharpening our sensors in order to see benefits from A.I.?
The way we talk about data quality is misleading. We speak of “clean” data as if there is one state where data is both accurate (and bias-free) and reusable. Clean is not the same thing as accurate, and accurate is not the same thing as actionable. Problems on any one of these vectors could impede an A.I. model’s development or interfere with the quality of its results... (MORE - details)
EXCERPTS: . . . For as long as the Department of Defense (DOD) has collected data, it has spent billions of dollars attempting to “clean” it. The dream of a master brain in which streams of clean and accurate data flow in and produce insight and greater situational awareness for governments and armies is as old as computers themselves — France tried it, Chile tried it, the Soviet Union tried it three times, undoubtedly China is trying it right now — but no matter how much data we gather, how fast or powerful the machines get, it always seems just out of reach.
Experts will point out that data scientists spend roughly 80% of their time cleaning data [...] After decades of investment, oversight, and standards development, we are not closer to total situational awareness through a computerized brain than we were in the 1970s. As the computers have gotten more advanced, the amount of data they are drowning in has increased too.
And it’s not just the DOD’s money that has failed to solve the problem. [...] Despite billions invested in A.I. moderation, the largest social media companies still rely heavily on armies of human beings to scrub the most horrific content off their platforms. It may not be a surprise that Big Government can’t get a good return on investment from A.I., but it seems Big Tech can’t either.
When attempting to engineer a solution to a hard problem, it’s worthwhile to strip things down to first principles: What assumptions are we making, and how do those assumptions frame what problems we think we need to solve? If those assumptions were different, would we be solving different problems? How do the problems we want to solve map to outcomes we value?
The outcome we’re all hoping for from A.I. is better decision-making. [...] There’s no mystery as to why the DOD would want to prioritize technology that will allow it to prevent conflict or minimize collateral damage. There’s no confusion as to why Facebook wants to control hate speech on its platform.
But the research that has been done by scientists [...] calls the value of knowing more into question. ... That seems unbelievable: Perfect information should automatically improve the decision-making process. But it doesn’t because more information rarely changes the organizational politics behind a decision.
A.I. can correctly identify the content, but the decisions made based on that content are heavily informed by the norms and expectations of both the users and the organization. Facebook’s moderation policies, for example, allow images of anuses to be photoshopped on celebrities but not a pic of the celebrity’s actual anus. It’s easy for human beings to understand how the relationships between stakeholders make that distinction sensible: One violates the norms around free speech and public commentary; the other does not.
As long as decisions need to be made in teams, accounting for various stakeholders and their incentives, the best path to improving decision-making isn’t simply adding more sensors to get more data. You need to improve communication between stakeholders.
This begs the question: Do we need to invest billions of dollars cleaning data and sharpening our sensors in order to see benefits from A.I.?
The way we talk about data quality is misleading. We speak of “clean” data as if there is one state where data is both accurate (and bias-free) and reusable. Clean is not the same thing as accurate, and accurate is not the same thing as actionable. Problems on any one of these vectors could impede an A.I. model’s development or interfere with the quality of its results... (MORE - details)