Twitter thread from Benedict Evans. https://twitter.com/benedictevans/status/1098047158593277952
When a machine learning system mistakenly diagnoses an imminent failure in a hydroelectric generator because the manufacturers of the telemetric sensors constituted an unrecognised skew in the training data … this is a real world case of ‘AI bias’.
AI bias’ is a data and statistics problem. You make a machine learning system that can tell X from Y by giving it lots of examples of X and of Y. But your example sets also necessarily contain ABCDE. You might not notice D, and it might skew what the machine sees as the examples.
Hence the famous system that was great at spotting sheep, until it was given a grassy hill and said ‘sheep’. Guess what was in all the examples of ‘sheep’? The system doesn’t know what sheep are – it just looked at the examples, and the hills were more noticeable than the sheep.
There are obvious cases where this problem could apply to human diversity in bad ways – ‘why exactly are you denying that person a credit card? What *exactly* is in the training examples?’ That’s a subset of the fundamental technical challenge:
Those ABCDE variables might relate to human diversity – or fluorescent versus incandescent light, or the chips used by different camera manufacturers. Meanwhile, X and Y might not be about people at all – you might be looking for natural gas leaks.
That is, this is a tool that finds patterns in data. The data might be biased by containing things you don’t realise. If it’s data about people, this bias will be about people. if it’s data about gas pipelines, the bias will be about gas pipelines.
Finally, of course, *human* analysis of both gas pipelines and people is biased, in all sorts of hidden ways. Try not to get a parole hearing in the last slot before lunch.