IdeaFest 2017: Data scientist says there’s a big problem with big data

Data scientist and author Cathy O’Neal talks about how reliance on flawed algorithms undermine societal cohesion. | Photo by Boris Ladwig

Beware of algorithms.

Data scientist and author Cathy O’Neil told young Louisvillians Tuesday at IdeaFest 2017 that unlike their reputation for being unbiased, correct and fair, algorithms harbor the same prejudices and flaws as their human creators.

Mathematical models are not facts, she said, but opinions embedded in math.

“We shouldn’t trust them,” she said.

Algorithms are sets of rules, and the algorithm’s creator decides the relevant data, which is a subjective decision, she said.

For example, when she cooks for her family, she and her son use different data sets to determine the meal’s success. While O’Neil deems the meal successful if her son ate his vegetables, her son believes the meal was a success if he got to eat Nutella.

“We put subjective choices into our algorithms,” she said.

Increased reliance on cryptic algorithms, particular for big decisions such as teacher accountability, creditworthiness and the risk of a criminal defendant to commit another crime, can perpetuate racism, sexism and other human failings, she warned.

Cathy O’Neil

O’Neil holds a doctorate in mathematics from Harvard and worked in finance in 2007, when she realized that rating agencies’ rated risky mortgage-related financial instruments as safe investments and played a big part in the most recent financial crisis.

O’Neil said when few people in the sector lost their jobs and even fewer went to jail, it made her mad.

She began looking into other areas in which humans increasingly hailed big data as means to solve problems — only to fail miserably.

In education, for example, states used algorithms to try close the differences in academic achievements among certain student groups. The idea, O’Neil said, was to use data to identify and remove bad teachers to improve student performance.

Initially, the programs tried to find bad teachers by tracking how many of their students performed poorly on tests. The flaw in that model, she said, is that socioeconomic and other factors play a significant role on test scores, which meant the model simply punished teachers of poor kids.

In a subsequent attempt, education leaders tried to identify bad teachers by comparing how their students fared on tests compared to their projected performance. That approach, too, contained significant flaws, because test performance can be affected by factors as varied as whether the student missed breakfast that day and whether the school’s air-conditioning malfunctioned. Essentially, O’Neil said, teachers were being scored by comparing two data points that were difficult to measure.

“There was nothing trustworthy about this scoring system,” she said.

And yet, the systems had wide-ranging consequences.

Good teachers were getting fired, bad teachers were retained, and potentially good teachers stayed away from the profession because they did not want to work in an unjust system.

Bottom line, O’Neil said, the approaches created a nationwide teacher shortage and failed to improve the education system.

Pervasive, dangerous

O’Neil last year released a book on the subject, “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.”

She told students Tuesday that she focuses on widespread algorithms whose composition is unclear and whose impact is destructive in that they disenfranchise people and undermine societal cohesion.

Flawed algorithms also reduce employment opportunities and lengthen their prison sentences, especially for minorities, O’Neil said.

For example, she said, algorithms that determine a criminal offender’s recidivism risk use irrelevant data and ask questions that reveal their creators’ prejudices. One such algorithm asks people about whether they’ve ever been suspended from school. The algorithm raises the recidivism risk for a “yes” answer — even though minority students are much more likely to be suspended from school, meaning the algorithm introduces a race component.

What’s worse, O’Neil said, is that the flawed algorithm, beyond being unfair, makes society less safe, because people with a greater risk of committing another crime receive potentially shorter sentences, while people who pose a smaller risk and could potentially be reintegrated into society more easily become institutionalized and are, because of their longer sentences, more likely to commit another crime. ProPublica, an independent, nonprofit investigative journalism organization, wrote about the racist algorithm last year.

O’Neil is calling for the creation of a hippocratic oath for data scientists, to make them think about their impact on the world. And, she said, algorithms need to be much more transparent, so that people can understand the results and appeal them when they’re wrong.

“We don’t have a perfect society,” O’Neil said. “When we do, we can automate it.”