Are we teaching A.I. to be racist?

By Larry Velez, Sinu CTO


So I was educating myself on machine learning regression algorithms (artificial intelligence software) during lunch as most people do, right? I watched a great 3Blue1Brown video series on the subject, then YouTube suggested a tutorial showing a data set example that uses the percentage of blacks in a neighborhood as a classification in a machine learning formula to try to calculate home values in an area. (See 20:13 minutes into this YouTube video from Edureka.)

Screen Shot 2019-09-13 at 2.39.49 PM.png


I had to do some digging around to track down that this is the data set which is getting around and clearly has a variable BK which “is the proportion of blacks by town.”

Screen Shot 2019-09-16 at 12.54.38 PM.png

Turns out this is an open-source dataset from the 80s and it’s being used by modern-day programmers to train artificial intelligence (A.I.) algorithms.

Upon further investigation, I discovered multiple major universities host this dataset and some of them use it in their classes, including Vanderbilt and Carnegie Mellon University.

Screen Shot 2019-09-16 at 12.56.46 PM.png

This has to be a fluke, right? Some antiquated real estate data set floating around, perhaps?

Seems teaching machines human biases is a growing concern, well beyond an old set of real estate data. For instance, in a New York Times opinion piece, Dr. Dhruv Khullar, an assistant professor of health care policy and research, writes that A.I. could make “dangerous biases automated and invisible” and worsen health disparities.

“... Because A.I. is trained on real-world data, it risks incorporating, entrenching and perpetuating the economic and social biases that contribute to health disparities in the first place,” writes Dr. Khullar. “Again, evidence from other fields is instructive. A.I. programs used to help judges predict which criminals are most likely to reoffend have shown troubling racial biases, as have those designed to help child protective services decide which calls require further investigation… In medicine, unchecked A.I. could create self-fulfilling prophesies [stet] that confirm our pre-existing biases, especially when used for conditions with complex trade-offs and high degrees of uncertainty. If, for example, poorer patients do worse after organ transplantation or after receiving chemotherapy for end-stage cancer, machine learning algorithms may conclude such patients are less likely to benefit from further treatment — and recommend against it.”

The old adage “garbage in, garbage out” goes so much deeper than it did just a few years ago. Before A.I., when we gave computers the wrong data, the results were just unhelpful, not harmful to entire communities of people.

“Back then, this was mostly a problem for computer programmers and analysts,” explains Bernard Marr, Forbes contributor. “Today, when computers are routinely making decisions about whether we are invited to job interviews, eligible for a mortgage, or a candidate for surveillance by law enforcement and security services, it’s a problem for everybody.”

And indeed, this is everybody’s problem. We need A.I. to be better than humans, who carry conscious and unconscious biases. Unchecked, bias data will help shape dangerous, invisible decisions by A.I. that could haunt us for decades to come.