The answer of course is that the surgeon is his mother.
The setting is used to reveal the hidden prejudices we share about different groups of people, in this case, women. More recently, however, scientific investigations of prejudice have expanded to examine not just a single sociodemographic group by itself—defined by gender, say, or race—but the interaction of multiple, intersecting groups together (e.g., x. defined by gender and race at the same time ).
“White, rich women experience a different world than white, poor women. Similarly, black women experience a different world than black men and so on,” she says Tessa Charlesworth, assistant professor of management and organizations at Kellogg. “Because of this, it’s important to understand how to study this intersectionality at scale.”
A new tool developed by Charlesworth, Kshitish Ghate of Carnegie Mellon, Aylin Caliskan of the University of Washington and Mahzarin Banaji of Harvard provides a way to do just that.
Using the new process Flexible Intersectional Stereotype Extraction, or FISE, the researchers analyzed 840 billion words of text to discover biases around different intersectional groupings of people.
In addition to providing a proof of concept for their process, the researchers’ initial analysis found that historically powerful groups—wealthy and white—dominate in terms of how often they are discussed. The words used to describe these groups are also overwhelmingly positive, while the words used to describe historically ignored groups tend to be negative.
The best of both worlds
For decades, scholars have viewed language as a reflection of our biases, the idea being that how we talk about something opens a window into how we think about that thing.
With this rationale in mind, social scientists have had success mapping a single characteristic, such as gender, against stereotypes. However, mapping the intersection of many features has proven more difficult. Computer scientists, meanwhile, have had more success mapping stereotypes to cross-sectional characteristics, but their methods are more technically complex and computationally intensive and typically rely on group information encoded in names (eg, Jamal vs. John); while leaving unexplored more hidden groups (that is, those not easily decoded from names), such as sexual orientation, religion or age, among others.
FISE bridges these shortcomings. “It’s a much lighter model than what’s typically used to look at intersectionality,” says Charlesworth. “This means it does not require heavy computing resources and is more flexible for use in different language environments. For example, with this tool, we will be able to flexibly examine language intersectionality in non-English languages or even historical texts from 200 years ago.”
The tool works as follows. First, researchers identify the terms they want to study—descriptive properties such as; hot, cold, enthusiastic, friendly, and so on, for example. Second, by crawling huge text files on the Internet from Wikipedia and Common Crawl, the model calculates how closely these words are related to other terms along a first group dimension, such as social class (with rich and abundant signifying wealth and Poor the needy signaling poverty). Third, the model calculates the difference between how close to like the descriptive quality is hot it is associated with wealth as opposed to poverty. Finally, this same process takes place along a second group dimension, such as race. These differences in how warmth is perceived across both race and class can then be mapped into four quadrants (rich–Black vs. rich–white vs. poor–Black vs. poor–white), revealing intersectional stereotypes of the people.
Although most of the analyzes conducted by Charlesworth and her colleagues used only two dimensions, “in theory, you could extend this analysis to as many group dimensions as you want. We could simultaneously look at race, class and gender, or even more dimensions,” he says. “Interpretation becomes more difficult, however.”
How prejudices are reinforced
Now that the researchers had built their tool, they wanted to test it against a “ground truth” by analyzing how accurately it linked gender and race to 143 different occupations.
To do this, they ran the FISE, classifying each of these occupations into four quadrants: white-male, white-female, black-male, and black-female.
They then cross-referenced the FISE results with a 2022 Bureau of Labor Statistics report. If the model worked as it should, then occupations associated with blacks in the real world should have the same linguistic associations according to FISE.
The tool worked. FISE found, for example, 59 percent of occupations are associated with white men, while the BLS lists 48 percent. It found that 9 percent of occupations are associated with black women, compared with 5 percent in the BLS data. The authors note that although these figures are not identical, they are also not statistically different. Similar levels of accuracy were found for gender and race-by-class comparisons.
Having demonstrated the fit between the FISE and occupational data, the researchers then used the model to analyze the correlation between qualities such as honest, courageous, greedy, and so on with race, gender, and class. The model revealed two key insights.
“The first is what we call the dominance of powerful groups,” says Charlesworth. “When you look at intersectional groups like white men versus black women, you see really clear patterns of white men dominating the language space, purely in terms of the percentage of traits associated with that group.” Fifty-nine percent of all The characteristics analyzed refer to white men, while 5% refer to black women. 30% are related to white women and six percent to black men.
The second idea was that traits associated with white men and women are largely positive, while those associated with black men and women tend to be negative. (To determine this, given the one-sided dominance of white men in the linguistic space, the researchers necessarily assigned an equal number of descriptive characteristics to all four groups.)
That said, when the researchers introduced the dimension of class into their analysis, they found that intersectional biases involving social class had the strongest positive and negative correlations of all. White-rich or male-rich were overwhelmingly positive, while white-poor or male-poor were overwhelmingly negative.
“Although I tend to focus on race and gender in my research, social class has continued to emerge as the most important dimension that determines which intersectional groups are seen as positive,” says Charlesworth. “It’s interesting, though the class was definitive when I thought about it quality (positivity/negativity) of the traits, was relatively less important when thinking about it frequency of the characteristics. In terms of frequency, race and gender were the most important dimensions in explaining which groups dominated the language space.’
Shining a light through the tongue
Charlesworth’s tool will allow scholars to track how biases, including crossover biases, change over time.
Using FISE, researchers can now see when and how ideas about, say, black women change. Or they could use FISE to determine how long it takes for demographic changes in occupational statistics (ie, changes in the percentage of managers who are black women) to eventually change the stereotypes embedded in language.
Or, as Charlesworth wonders, perhaps the opposite is true. perhaps the changes in language function as prophecy. “It could be that we first see the possibility of female doctors as ‘a thing’ that appears in language. Maybe it’s just a possibility that was raised for white women to begin with, or rich women. But we can see how language is a harbinger of change in the world,” he says. “One of the great things about FISE is that it allows us to examine the directionality of this effect and, for the first time, to study the evolution of intersectional stereotypes in history.”
But as her initial analysis suggests, a tool like FISE can also shed light on long-standing concerns about algorithmic bias. As Charlesworth notes, we have a problem if white men dominate the data of education and greater wealth is synonymous, in our language, with greater goodness. Whatever AI produces will be much more problematic when it comes to poor or underrepresented groups.
We’re already seeing disparities emerge: AI-generated faces are judged to be more realistic from natural human faces, but only for white faces due to their dominance in the training data. And the frequency of women or men in Google search results, itself a product of social bias, becomes self-reinforcing by forming beliefs about the default “person.”
Charlesworth summarized: “With artificial intelligence and large language models (LLM) becoming part of our daily lives, we, as consumers, researchers or leaders, need to understand how bias exists in these tools. In particular, understanding the unique patterns of cross-sector biases in AI and LLMs will be essential to making technologies more equitable for the future.”