How Data Science Fights Sexism (and Why It Matters to You)
By Kylee McIntyre08-Sep-2017Views 3530

data science

Is AI racist? In short, if it's advanced enough, yes. At the end of the day, AI is still designed by humans, and human creation reflects the bias of its creators. That's why you get stories about Google Photos tagging Black individuals as “gorillas", AI lending favor to men when it comes to STEM jobs, and the infamous Tay, a Microsoft chatbot forced offline when individuals on the internet had her posting pro-Nazi slogans in a matter of hours.

Why does this matter? For starters, it means that the machine that you're relying on to give you clear-cut answers is giving you skewed ones. Consequences of using this data can include shutting out an entire demographic from your product or having discrimination become part of your brand.

It's these problems and more that data scientists face every day at work. At City AI's Model Bias event, data scientists from several companies - ranging from ride-share and on-demand services company Go-Jek to insurance firm Allianz - gathered over snacks and drinks at Uniever's Level3 coworking space to discuss how to best identify model bias and eliminate it from data sets.City AI's Victor Alexiev introduces the speakers.

The speakers included Nvidia AI's Simon See, Allianz and Girls in Tech Singapore's Wan Ting Poh, MSD's Jason Tamara Widjaja, Go-Jek's Zane Lim, and Tenqyu's Jan Daniel Semrau. After each speaker gave a presentation - around ten minutes long - they joined a panel moderated by City AI's Victor Alexiev.

Though the issue they were tackling was complicated and certainly not solvable in one evening, here are some concrete takeaways they brought forward about model bias - important for you whether or not you work in data science.

Start here

Using pictures as an example, Simon points out early on why bias is not only common but natural - it's impossible at this point in time to have all of the perspectives of every individual in the world. “Unless you can have the whole world of data […] unless you have all the images in the whole damn world, it causes bias," he says.

Even then, the information would still be far from objective, because that data would be comprised of data from individuals who see the world through a subjective lens.

Jan also illustrates this point simply - if one were to use data to map out when the best events are happening in Singapore and where, that model could not simply be translated over to work for a country like Indonesia. There are too many differing factors, like size, population, and infrastructure. “You need to understand why data works," he says. In this case, that means looking at why people go to events, and why those locations are popular.The bias is okay, as long as we know that we are biased.

A great starting point for anyone wanting to tackle biased data is to simply know and accept that it is biased. And other human-created methods of identifying that bias are also going to be flawed. “It's a challenging problem," explains Simon.

At the end of the day, though, he believes that awareness might matter most. “When we choose the parameters [of the data], we know we're biased. The bias is okay, as long as we know that we are biased," he says. “We must be able to understand it."

Jason, meanwhile, notes that sometimes, having a smaller set of data to choose from might be better than having biased data, and suggests eliminating factors like gender and race from the data set in general.

Quick eye, fast fingers

Let's say you've understood that. Now, what do you do about it? In a throwback to a math-heavy science class you might have had in high school, Wan Ting kicks off her presentation by explaining the difference between statistical bias and statistical variance.

data science

Bias vs. Variance

Common fixes to bias include having a bigger set with more data points (increasing your sample size), trying a different set of features, running a partial-dependence plot (PDP), and being aware of bias that can occur around factors commonly used to discriminate in a group of people - race, gender, and marital status, for example.

None of that is a quick fix, though, and several of the presenters share stories about times when, despite their best intentions, they had to figure out what to do with errors in their data anyway.

Data driving apps

For Zane, data science is a way to make sure the right drivers are getting customer pickups for Go-Jek, which means better business and happier customers. The Indonesian company, which bases its data science team in Singapore, uses a system which Zane describes as a boosted model or boosted tree. The system is designed to reward good behavior on the part of drivers. Better drivers - those on time who don't cancel often - get a little boost when matching with a customer. Staying ahead of bias is very important, particularly in Jakarta, where Zane says that drivers will likely get together and riot if they have an inkling that something about the app isn't fair.

Data that's come out before has shown that males are more likely to default than female, that those who are married are more likely to default than those who are single or who identify in a third category - “other," and that those who are highly educated (have finished their A levels) are more likely to default than those who are unknown.

However, says Zane, a closer look at the data reveals that the team has to approach this with caution, as the sample size of those with unknown marital status or education levels is much smaller than that of their respective counterparts.

Wan Ting, meanwhile, brings up that it's not always about the scientific things data personnel can carry out - it's more about asking what the business you're working for actually needs.

She, along with the data science team of which she was a part, set out to create a profile of the ideal customers for car insurance. The model, which helped inform who to target, reached 78 percent accuracy, which was good enough to take in for evaluation.

The problem was that the people they decided to target already had car insurance.

It's not the kind of finding that the team would have definitively been able to predict beforehand, says Wan Ting, but more communication between the team and their superiors could have helped. “Ask a lot of questions - ask a lot of questions and think about all these biases," she says. “What biases are possible?"

The right data scientist for the job

Data science is a fast-growing field, because, as proven by events like Trump's election to presidency in the US, data means money - and victory. How do you find the right person for the job?What we've seen is heaps of people want to be data scientists, but no one wants to actually understand what we're doing.

The speakers reveal a wide range of paths into data science. Simon's degree was in applied mathematics. “By accident, I stepped into the world of AI, machine learning, and so forth," he says.

Meanwhile, Jason, who has been studying data science full- and part-time for years, points out that the diversity of perspectives can be beneficial to the field. “There are two contradictory takes on AI: I need to have three PhDs and study forever and do all these things because it's so hard; [the other is] AI is for everyone and one day everyone will have AI, and so on," says Jason in his presentation., which goes into the power of big data models to affect the social order.

In his opinion, the truth lies somewhere in the balance. While data science is still a quickly growing field and there aren't enough data scientists to fill the demand for a while (though data science is a relatively well-paying profession, Jason's more concerned about the people who serve as their liaisons to the world. “What we've seen is heaps of people want to be data scientists, but no one wants to actually understand what we're doing," he explains.

In other words, the world needs more analytically literate managers who understand enough about data science to help the field reach its full potential.

Are you a data scientist passionate about making use of the best models? Are you looking for a job that'll help you reach your fullest potential? Or, are you a hiring manager in the market for data team members who really dedicate themselves to the workplace? Check out 100offer, where curated data scientists and other tech talent are finding their dream jobs at innovative companies like Go-Jek.


9bf6b051e86541ecb5c1ed359f66eb5d1496633888 80x80
Kylee McIntyre
American tech, science, health, and environmental writer. Lover of scifi, fantasy, travel, and coffee. Find her on Twitter @ejkyleem.
0 comment