A conversation on artificial intelligence and gender bias

(11 pages)

The world celebrated Women's History Month in March, and it is a timely moment for us to look at the forces that will shape gender parity in the future. Even as the pandemic accelerates digitization and the future of work, artificial intelligence (AI) stands out as a potentially helpful—or hurtful—tool in the equity agenda. McKinsey recorded a podcast in collaboration with Citi that dives into how gender bias is reflected in AI, why we must consciously debias our machine-human interfaces, and how AI can be a positive force for gender parity.

Ioana Niculcea: Before we start the conversation, I think it’s important for us to spend a moment assessing the amount of change that has taken place with regard to AI, and how the pace of that change has accelerated over the past few years. And many people argue that in light of the current COVID-19 circumstance, we’ll feel further acceleration as people move toward digitization.

It’s really incredible. I spent the past eight years in financial services, and it all started with data. Datafication of the industry was sort of the point of origin. And we hear often that over 90 percent of the data that we have today was created over the past two years. You hear things like every minute, there’s over one million Facebook logins and 4.5 million YouTube videos being streamed, or 17,000 different Uber rides. There’s a lot of data, and only 1 percent of that is being analyzed, as said today. AI is really the way that we can actually make sense of all of that, process all of this information, analyze it and transform it into actionable insights. AI is becoming more powerful. I think people are familiar by now with Google DeepMind, which defeated the famous player in Go, Lee Sedol. That’s already becoming obsolete. We see Facebook Pluribus robots essentially beating key professional poker players in a six-player tournament in Texas Hold’em poker.

AI is becoming more powerful and more important. So, having taken a step back and thinking about the amount of change that we’re seeing, I’m going to move on to our panelists and ask them about how they see AI impacting the workforce and the role of women in particular, because that’s ultimately what we’re here to talk about today.

Anu Madgavkar: Maybe if I jump in: I think you framed the opening question very well, because it’s certainly true that AI and related technologies, like automation, will have a very profound effect on the world of work going forward—and women are going to be among the most impacted groups who are going to have to learn how to deal with and adjust and adapt and learn how to use many of these technologies. So, even prior to the pandemic, when we had looked at the impact of AI and automation on the world of work for women, what we found is that across the countries we studied, up to about 160 million women would need to make occupational transitions because the nature of work they were doing in their current occupations would change; machines would actually do that work. And in order to stay employed, women would need to find new kinds of jobs to do.

Women are going to be among the groups who are going to have to learn how to adapt and use many of these technologies.
Anu Madgavkar

But it’s not just the transitions, it’s also the fact that a very large number of women are going to need to learn to work with technology. Their current jobs are going to involve much more interface with applications that use AI, whether you’re a nurse in a hospital that’s using AI to support diagnosis or treatment of patients or a personal financial adviser in a bank who’s using AI to do some level of analysis, women are going to need to know how to do that and build those kinds of skill sets as well. And as you said, COVID-19 accelerates and exacerbates this to a great extent. By our estimates, again, the number of occupational transitions women will have to make might grow by 25 percent or more going forward, simply because COVID-19 has been more disruptive, and the adoption of AI and automation is likely to rise even further going forward. So it’s not something we can ignore. And we absolutely have to understand both the potential as well as some of the pitfalls of how AI technologies impact women in the workforce.

Ioana Niculcea: I think that’s exactly right, and I think it’s important to emphasize that this also comes at a time when corporations across industries are becoming much more attuned to the diversity and inclusion agenda. And this is also elevating from a much more administrative check-the-box exercise to something that’s becoming a very strategic decision for the company or board-level agenda item, if you will. So, when you put together those big structural impacts that you mentioned on the workforce, the need to make very significant occupational transitions, and to learn to work together with the AI in essentially a new analytics framework, and you put that alongside that new agenda item that’s very important for industries all around, I think we definitely need to talk about these issues.

And I know, Muneera, you looked extensively around what our sources of bias are with regard to AI and gender. What have you seen there that will be important to highlight as to where things can go wrong?

Dr. Muneera Bano: Thank you, Ioana. My understanding of the biases in AI is to make people think of them as an imitation of the human brain. It’s as simple if you want to look at it that way and it’s as complicated in other terms as well. So . . . there are two ways how I would put it in simple terms for people to understand how biases can make their way into the algorithms. The first is—of course, we already mentioned—the data. Just like a human child born with nothing, no existing knowledge of any phenomena, how they understand reality by observing, by learning—this is how algorithms learn from the data. So, whatever data is provided to them, how they are trained, any biases that they will pick up, any pattern they will pick up, the only thing is how much intelligence is there in the algorithm to rectify those biases. If not, then of course they will repeat those patterns and magnify them with their computational powers.

We have seen examples of many algorithms that learn from the data and then exhibit those patterns in their output. If there is sexism embedded within the data, they will pick up that pattern and exhibit the same sexist behavior in their output. And unfortunately, the workforce in AI is male dominant. Only 20 percent of employees in the technical roles at major machine learning companies are women. According to the UNESCO report, 12 percent of the artificial intelligence researchers probably are women, and 6 percent of professional software development in the field of artificial intelligence are women.

If there is sexism embedded within the data, they will pick up that pattern and exhibit the same sexist behavior in their output. And unfortunately, the workforce in AI is male dominant.
Dr. Muneera Bano

So predominantly, then, you have a male workforce working on these algorithms, defining the rules. They may not be intentionally, but it’s human nature. Even if you will have a dominant force of women designing something, they tend to be designing in a way that would suit them. Unconsciously you are adding your biases into the algorithms; you are creating those rules.

As human beings, we are not free of biases. We all have these with us. We are who we are, our background, what we have learned. And when we are creating software, there is a possibility that we will be transferring them. It’s the fact that if this algorithm is creating a discriminatory result against a particular class of society based on gender or race or ethnicity or economic factors, that’s where we need to be careful, at how these biases are impacting the decisions that these AI are making, unless we have these super-intelligent algorithms that will learn to rectify [them] on their own. Until then, as software developers, as companies or governments, we have a responsibility to make sure that these biases are not creating discriminatory behavior toward women or people of different races.

Anu Madgavkar: And I think Muneera raises a very important aspect of the question, which is, what do we really mean when we say something is discriminatory? Or what do we mean when we say something is fair? I think a lot of the research on thinking around bias in AI is trying to also tee up that, to say, how do you really define fairness? And just to give you an example, it was quite a seminal study that was done in the context of racial inequality and discrimination with very profound implications, because one of the big application areas for AI in the US, for instance, is in the criminal justice system, where AI platforms and algorithms are used by judges or probation officers or parole officers to do risk scoring for criminal defendants, and try and predict the likelihood that this particular person will reoffend. So risk of reoffense is actually guiding real decisions by judges and probation officers.

An investigation into the accuracy of one of these platforms revealed that African American defendants were twice as likely to be incorrectly characterized as high risk for violent reoffense. The false-positive rate was twice as much as for white defendants. And the reason I bring up this example is because the notion of what is fairness is kind of summarized here, because in defense of the algorithm, it’s a true positive rate. So its ability to predict correctly was actually on par across African American and white. It was pretty accurate and did not bias on the true side. But when you looked at it on the false positive side, there was indeed a bias. So it raises the question as you think about AI and its applications, what are you really trying to optimize for?

And the other question is, is fairness something that is close to what human decisions would have been made in the absence of AI? Probably not. We are seeking to improve and de-bias in human decision making by the use of AI, so we should not mimic exactly what human decision track records have been in the past.

And at the same time, should fairness be held against the bar of, well, what’s the objective data out there? For example, a lot of facial recognition AI software and image recognition technologies, if you search for a particular occupation, will throw up hundreds of images that match that occupation. So one study looked at different occupational keywords: what’s the mix of genders that are thrown up by common search engines in terms of the images? And what they found was that for highly gendered occupations, like nursing, for example, where 80 percent-plus workers are women, the images thrown up also sort of mimic that mix quite well. But for an occupation like CEO, only about 10 or 11 percent of the images were female images. Whereas in reality, the data in the US at that time had 27 percent of CEOs as women.

So there’s something about the data that’s popularly built and used and reused and propagated that doesn’t really reflect the data of the real world. And therefore, some of these biases get further amplified and further fed into each other because it’s these very images that are, again, reinforcing people’s perceptions of what the reality is.

Dr. Muneera Bano: I would like to concur with the point Anu made here that using historical records to train AI without being cautious about these biases is like repeating history, but this time with more powerful tools.

Ioana Niculcea: I think it’s really interesting because two things come to mind based on what both of you are saying, Anu and Muneera. One is that it essentially seems that we need to learn how to work with AI, including to de-bias its decisions; and two, relatedly, where we’re seeing a lot of time and resources being invested at the moment is in developing explainable AI, because to a large extent AI has been considered somewhat of a black box so far. And at least from the financial service industry, that has been problematic for a number of reasons, which include explaining decisions to regulators, and that could have implications for gender bias that are quite significant in terms of things like credit approval for different demographic groups, but also in terms of being able to explain how a certain AI-based product works to your customer.

Would you like to learn more about our Analytics Practice?

Visit our Risk Advanced Analytics page

I wanted to see if there’s any more real-life examples that you had, because I know that that always helps make the discussion much more tangible, and particularly as we seek to understand what kinds of things we would eventually be able to implement to make positive change.

Dr. Muneera Bano: A very classic example is Google Translate. For example, I come from a Persian and Pashtun descent, so our language is gender neutral. So, in Google Translate, if you put some statements that have gender associated with them, like, "She’s president. He is cooking." Two very simple sentences; you translate them into a gender-neutral language, like Farsi or let’s say Turkish, this becomes, "This person is president, and this person is cooking." And then if you ask Google Translate’s algorithm to translate them back and provide a gender for both of them, it will always put, "He’s president and she’s cooking," even though initially it was the other way around. It won’t pick he for both, it won’t pick she for both. It’s because the statistical probability based on the training-set data is that it’s most probably he who is going to be the president and it’s going to be she who is going to be cooking.

They tried to address this issue by providing both translations, which is not solving the problem of the data bias; the data bias is still there, it’s just that they put a lid on top of it. But this example shows what happens—this is a very small scale, it’s not a very big problem, but then there are some other examples where there could be more serious problems as well. In medical science, the medicines are being tested on men dominantly and then are recommended for all patients, regardless of their gender; sometimes they don’t work for women. There is a really great book, Invisible Woman by Caroline Criado-Perez, who talked about the impacts and data gaps that are created in ordinary life. It’s about even the car safety system that’s being tested only on men, or even the driverless cars, how they are going to be addressing the issues of sexism and racism. There are plenty of examples in real life where these algorithms, if we do not properly or cautiously address these issues, can have far more fatal consequences as well.

Another very classic example within the field of machine learning was an image of Lena [Söderberg], from the ’70s. The picture was taken from a Playboy magazine, and it was used widely throughout the machine learning community because they were all dominantly meant to test the machine image-processing algorithms. And even only recently there has been a push to remove that image because this is something that’s reinforcing the gender bias within the community. How do you portray the whole image? That’s one of the reasons that the voice assistant systems, whether they are from Amazon Alexa, or Cortana, or Siri, by default, they all have female voices. And this also gives the psychological impact of someone who is submissive, who has to follow orders; and when there’s failure of software or hardware, a female voice is the face of that failure.

And so, this is kind of reinforcement within the tech industry: how you want to portray the image of women, whether it’s designing the algorithms, whether it’s using the voices, using the images. And use of images has already been exhibited in one example where an artificial-intelligence algorithm was asked to do a beauty contest, and it did not pick any women of darker skin, considering that looking at the pattern of data in the past the beauty standard was very predominantly skewed toward white and blond women.

So it’s all coming down to the data and the rules, and those are coming from humans. We as humans have the responsibility of how we design them, how we provide this ethical framework, just like Anu said, the fairness. How do we define this fairness? What do we have, this ethical framework around it? These recommendations, different cultures, different countries will have a different definition of what they think of as fair or ethical. And that’s where I think the real problem is. These technologies, AI algorithms, they are tools in the end, but who designed these tools and how do we use these tools, that’s what matters.

Anu Madgavkar: And it really matters because of the sheer power of AI as well, just in terms of the processing and computational capacity. This is a real challenge for when you think about many aspects of the workforce. So all these decisions have real economic implications and outcomes, because with the explosion of both people as well as data and profiles, what we’re really finding is that for every job vacancy, there are literally thousands of job applicants or more. And just doing that process of filtering down and selecting the right candidate is something that is very hard to do on a non-AI and data-enabled way, increasingly. So we are going to need such platforms for many sorts of really important economic decisions, like hiring or interview applications and things like that to get into, let’s say, professional schools. Access to the education and skilling system is going to be AI algorithm–intermediated, if you will. And therefore, coming back to this notion of how the algorithms are actually perceiving people: because ultimately, these algorithms will use tools that look at combinations of past data but also images, voice, nonverbal cues.

A lot of technologies are being introduced to do these kinds of screenings. And one study found that, again, as we look at the context of the data set, there were images of members of Congress in the US. The images were run through various AI platforms to see what sorts of labels were annotated to those images, because those labels actually are a way of saying how is the AI responding to that image; and this is actually what gets fed then into various kinds of screening software. All these senators and members of Congress were dressed identically; these were their official photographs very often with the national flag behind them in their office.

But the female images had three times the level of labels that talked about their physical attributes: the color of their hair, their hairstyle, whether they were smiling or not, for example, and labels such as “teen” or “kid” or “girl” much more for the female members of Congress. Whereas the males had labels like “official” or “attorney” or “senior executive,” which were more empowering and noted a higher level of power and social status than many of the labels that were ascribed to women. Again, it’s not the AI’s fault, it is just a mirror as to how society is also viewing individuals of different gender in this case, but we do need to reveal and make more transparent how these analyses are being done and how to interpret some of the outcomes and decisions. And that’s critical because AI has tremendous potential for good as well, which we absolutely must harness.

Ioana Niculcea: I think that that is critical, Anu, and I’m glad that you are taking us in that direction. I’d love to get both of your thoughts around how we can make use of AI and leverage it to improve the current gender agenda, if you will. And I’ll start with one example in our work that I think about what you brought up earlier, Anu, around using AI for talent sourcing, talent management, and talent development. We’ve looked at a number of AI screening platforms that are emerging to basically have people play neuroscience-based games to assess their cognitive and sometimes personality traits to evaluate whether they’re a good match for a further role. And these platforms realize that to some extent, these profiles of the top performer in a certain job that they will assess candidates against will be based on the profile of the current top performers in the corporations that are utilizing those platforms.

And they’ve instituted ways to essentially put in gender overlay, where they’re assessing different groups, and not just gender, just various demographic groups will have the same pass rate for these neuroscience-based games. Because they want to ensure a certain level of auditability for the AI and start implementing mechanisms to control the ultimate impact. And we’re seeing an increase in the use of such platforms; and I would argue, more importantly, in the use of this mindset of how executives actually work with AI. And I know we’ll talk about them more, but I would just love to get your thoughts around examples that you have seen, and maybe I’ll start with you, Muneera, in terms of AI being a force for good for the kind of change that we’re starting to make.

Dr. Muneera Bano: The very first advantage, I would say, of having AI is that it has brought forward this problem of gender bias in the data. Because of the sheer amount of data, we would not have been able to see the patterns that clearly the way that these AI algorithms have shown to us.

So I think the very first advantage that we took from AI was making this bias conscious for us. And besides that, I think because I do work with robots, and I teach robotics at Deakin University, and the very first thing always they say is that they will take away jobs for many. So, these algorithms, I don’t think they will take away jobs; this is an opportunity. And in the post-COVID era, I think, with a lot of businesses going online, less movement for the foreseeable future, we will have more data that is coming in the virtual space, and these algorithms are needed there. I will try to connect with the example that Anu mentioned a while ago with the selection of leaders and the perception that’s created around the leaders.

I think social media platforms play a huge role in how people post their opinions and in a lot of algorithms, because it’s a cheap source available to train the algorithms easily for small AI algorithms. And there are a lot of bad opinions, when you look at the comment sections or read the Twitter threads. How do we contend with that kind of data? Because in the past, I always say that history was written by the victors. Now it’s written by those who have the keyboard and access to the social media platforms. This is the digital history that we will leave behind and this is what the future algorithms are going to understand.

So I would say that this perception about leadership, how these algorithms are going to look at women, their role in society, it all comes from us. And these algorithms make us look into the mirror of what we are doing as a society. I think that’s one of the greatest things that AI has done, whether it’s in social media, whether it’s in the algorithms for recruitment, justice system, defense, education sector, anywhere you see. If there is a behavior that the algorithm is going to exhibit, we as humans will see that behavior more clearly than we would evaluate ourselves for that exact behavior. So to me, that’s one of the greatest things that the AI has done: show us a mirror of what we are as a society.

To me, that’s one of the greatest things that AI has done: show us a mirror of what we are as a society.
Dr. Muneera Bano

Anu Madgavkar: And I think in the context of emerging economies in particular and even Asia, for example, both realities are true. One is that women in many emerging economies, women in Asia, are less represented in many parts of the workforce. They’re less represented in terms of financial inclusion. And many women, almost 50 percent of women in low-income countries, even lack the basic means to identify themselves digitally or even, indeed, any form of identification. So, if you think about the empowerment gap that women have in many emerging countries, this is quite large. But it’s equally true that the emerging economies are the ones where we are genuinely seeing leapfrogging in terms of digital and internet access, the kinds of data usage, very often the kinds of digital infrastructure that are being created or built.

India is one example, which has really committed to both digital ID as well as open API stacks to enable and facilitate digital payments. So, while the empowerment gap is huge for women, the potential to also leverage all of this digital infrastructure, the flows and the data that go with it, to solve some of these basic issues that women face is also huge.

And one very promising area is actually around credit. There’s a huge unmet need for credit in the informal segments for most emerging markets. And they don’t have established credit scores or paperwork to suggest they are credit-worthy. But the data around the kinds of transactions they do, whether it’s the model of looking at how people pay their mobile phone bills—what does that data tell you? Or in India, around digital payment flows for a small vendor or a small-business owner. If you actually capture the kinds of payment in flow data, then that helps AI do a credit scoring that is leveraging the digital footprint of the person as opposed to a set of formal records.

And that is incredibly empowering, if it’s done well. But we do need, as you said, Ioana, the explainability factor to say, "OK, if a credit decision comes out in a certain way, how do you understand why this was the case?" And related to that, I think a lot of algorithms and developers are thinking about the so-called counterfactual example, which is, if this credit decision had to be different, then what would you need to believe about the underlying attributes, just to help people understand what’s really driving a certain AI-enabled decision, in this case a credit-related decision? Even to help customers or to help auditors and inspectors who will need to look at AI and what it’s really doing, to help them understand what are the binding constraints, how would changing those binding constraints actually change the decisions that the algorithm was making? Much more effort needs to be put into understanding things at that level.

Ioana Niculcea: It’s interesting. What you’re bringing up is sort of a mindset shift that needs to happen, and I think that’s really the way that we should bridge into what are some things that we should do: is this mindset shift, is it investing in AI explainability to help deploy AI as a force for good as it relates to gender? And you both brought up, Muneera and Anu, the point around utilizing AI to make sense of all of this data, which we did not have available before the digital footprint, as you called it. And we’ve seen similar efforts within corporations to deploy AI to basically provide an organizational MRI, if you will, around digital exhaust—email and chat data—and look at how women and men form networks differently within their organization, in terms of breadth, in terms of strengths, in terms of do they go cross functionally, do they tend to have enough senior support; and then use these metrics captured by the AI based on digital exhaust to understand where there’s room for intervention, if you want to help support women to get promoted and become seniors.

So I think it’s certainly a theme that’s resonating not just in the credit example that you provided but also in other areas. And I think it’s critical that we’re making steps that way. So, we’ll start with you, Muneera, and then Anu, we’d also love to get your thoughts as to how we make this actionable and what should business executives and other different stakeholder groups, maybe the data scientists and data engineers, do to make things better?

Dr. Muneera Bano: So, to me, it’s a multilevel solution. There is no one fix for all that you would say, because every AI exists for a different specific purpose. It’s objective within a company, within an organization, or within the countries, if it’s from governments, projects, initiatives. So, first of all, what is the individual level? Me, as a software engineer or data scientist, when I am sitting and working on something, what are my values and morality that will define what I am doing? I have to be conscious about not letting my personal biases get in the way of the design.

Second would be the company culture. That’s another level. What ethical framework have they provided for their developers to make sure that it is being followed? So that culture around the company, how they define it, how they put the recommendations there: that could be fairness. We discuss the fact that fairness is also contextual, different companies will come up with their own definition. This is an organization-level solution, and then comes, I think, the culture overall that belongs to a country. Different aspects around the ethical framework in different countries will be defined differently. So, within all of that, this is not a single fix that we can make, but everyone on their own level is responsible for what they are deploying, what they are creating, and what will be used to make decisions.

I can refer back to the recommendations that were given by UNESCO to combat the gender bias in applications using artificial intelligence. And they are recommending to the companies and the governments to, first of all, end the practice of making these gendered rules for the AI like voice assistants and other stuff. So, by default, you are pushing a gender stereotype within the tech company. Just like we want to adopt a gender-neutral language within the society to be more inclusive to the diversity that we represent, the same should come with the tech products as well, which are AI solutions—that they need to understand that the psychological impact products create on people by using female voices, by using female stereotypes, these need to stop immediately.

That’s the first thing that would remove a lot of psychological stereotyping, that IT is a male-dominated field, AI algorithms project female stereotypes about their roles in the society. So this is a very simple thing that can be done easily. It’s not an issue that needs a bigger debate. I think that’s the first step. And then we move toward a better culture that will be reflected within the country or within the organization.

Anu Madgavkar: I would agree with what Muneera says, and I think for companies, individuals, and all stakeholders, there are probably three buckets of things that are pretty much must-dos. They’re both sort of mindsets and attitudes, but also practices that we can embed into our day-to-day life.

One bucket is really about the spirit of a human engagement with AI. This is not something that AI is doing on the side; it's very much the spirit of humans in the loop type of decision making where AI is a tool that helps show you something, helps suggest something. But it’s also very incumbent on a set of human beings to understand, interpret, maybe make decisions, and be very actively engaged in that process and not think about it as something divorced from their own role. And associated with that is encouraging more fact-based conversations to educate people in the company about how all of this works, for example, and raising the ability and the awareness of people in the organization around many of these tools.

The second is possibly a set of good practices that we can implement along the whole AI-application value chain. So, right from when you think about training the AI on a data set, what are some practices to really reveal in a transparent way the quality of the underlying data? We should have a clearer set of commonly agreed upon or well-developed metrics around what a good data sheet looks like and what questions to ask when you’re thinking about the quality of the data. Similarly, when you’re thinking about the quality of the model, what are the questions you should be asking? At what point should you ask? Should you be implementing rigorously a sort of audit and a learning module at the end or midpoint and end of any AI rollout to say, "OK, what have we really learned from the experience and the outcomes that this particular application has generated?" So I think a set of such practices is pretty critical.

And then I think, finally, there’s a very strong foundational thing we need to attach upon it, but it takes on computer science at the highest level, which might be as simple as: in the education system, as more people are getting really smart on the technical aspects of AI and data science, can we also raise their awareness and ability to think about the ethical issues associated with them—almost make that a mandatory and core and valued part of the curriculum, not just a check-the-box or something that’s nice to do but doesn’t really matter in terms of how this professional is going to be built. So AI governance overall is a huge priority for the way companies and, indeed, all stakeholders have to think.

AI governance overall is a huge priority for the way companies and, indeed, all stakeholders have to think.
Anu Madgavkar

Ioana Niculcea: I think that’s all interesting. And I know, Anu, that a number of universities are already implementing such courses that lead this mindset alongside it, that’s really important. I want to thank you both for a fascinating conversation on this node of actual recommendation—what we can start doing tomorrow if we want to improve things. I know this has been really insightful, first of all, around the challenges that we are experiencing around AI and gender bias, but I think I’ll leave here optimistic that we can mostly turn things around and use AI as a support for good. And as we all engage with AI as part of this new analytic framework, we’ve identified a set of practices—it’s sort of basic hygiene, if you will, in terms of making sure that the data is the way it should be and that we are appropriately feeding this data to the AI, which I think you both identified as a tool at the end of the day in the context of the “human in the loop” framework, but it’s also a necessity—acknowledging this issue as part of the organizational and policy level and not to mention individual level.

Anu Madgavkar: And then be really open to that mindset shift and turning things around it. So it’s been a fascinating conversation for me, and I’ll leave with a lot of illustrations and also a lot of insights that we can all act on. So thank you very much for your thoughts.

Ioana Niculcea: Thank you. It’s wonderful to be a part of this.

Dr. Muneera Bano: Thank you for having me here.