Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5

Error-riddled data sets are warping our sense of how good AI really is

#1
C C Offline
https://www.technologyreview.com/2021/04...-progress/

EXCERPTS: The 10 most cited AI data sets are riddled with label errors, according to a new study out of MIT, and it’s distorting our understanding of the field’s progress.

Data backbone: Data sets are the backbone of AI research, but some are more critical than others. There are a core set of them that researchers use to evaluate machine-learning models as a way to track how AI capabilities are advancing over time. [...] In recent years, studies have found that these data sets can contain serious flaws.

[...] Now what? Northcutt encourages the AI field to create cleaner data sets for evaluating models and tracking the field’s progress. He also recommends that researchers improve their data hygiene when working with their own data. Otherwise, he says, “if you have a noisy data set and a bunch of models you’re trying out, and you’re going to deploy them in the real world,” you could end up selecting the wrong model. To this end, he open-sourced the code he used in his study for correcting label errors, which he says is already in use at a few major tech companies... (MORE - detals)
Reply




Users browsing this thread: 1 Guest(s)