Our lives are increasingly mediated by algorithmic systems, and the racial, gender, economic, and other biases they encode. Consider large language models (LMs). These models effectively memorize a language—by reading massive collections of digital texts—in order to enhance certain automated language tasks (e.g., prediction, summarization, synthetic text generation). And these LMs are now a central pillar of corporate investments in artificial intelligence. The problem—as Timnit Gebru, the colead of Google’s Ethical AI Team, explained in a coauthored paper—is that these LMs reproduce and amplify the “racist, sexist, ableist, extremist” and other harmful ideologies present in the texts they learn from, many of which are steeped in the toxic language of online platforms. When Gebru’s managers insisted that the paper lacked scientific merit, she spoke out and was subsequently fired, along with other members of her team.
This is by now a familiar narrative. Corporate behemoths like Google, fearing the effects on their bottom line, downplay the ethical consequences of their algorithmic systems. And researchers like Gebru—joined by a vital and growing chorus of voices in academia, journalism, and government—call on us to interrogate, dismantle, and reengineer these systems, treating them as the human-machine assemblages that they are. Easy to miss in the outcry against what Ruha Benjamin calls the “New Jim Code” is the intellectual and conceptual terrain opened up by this same critical agenda.1 The task of dismantling LMs and other black boxes means not only recognizing that they learn to see racial and other harmful imaginaries, but also understanding how they see them.
Here is where the problem of LMs becomes a learning opportunity for cultural historians or sociologists. Scholars in these fields have long pursued theories and rich descriptions of how racial and other discriminatory discourses operate in society. If LMs do reflect a collective racist imaginary, might historical and sociological theories help to understand how these models are seeing race? Conversely, what might these theories and descriptions learn from the new LMs themselves?
Such questions are starting to be addressed in literary studies. Recently, several attempts have been made to use some of the components of language models to better understand how racial imaginaries have operated in US fiction across decades—and at the scale of thousands of novels.2
This is reflected in my own work on modern Japanese literature. In The Values in Numbers: Reading Japanese Literature in a Global Information Age (2021), I look into how patterns of racial discourse manifest under conditions of empire. And I investigate how these patterns, at another level, relate to practices of literary characterization.
To study racial representation under empire is to confront a truth: the social dynamics of race are best apprehended not by isolating individual racial groups, but, rather, in the “space of interracial conflict itself.”3 This conflict is embodied in Japan’s imperial era, which stretches from the late 19th century to the early postwar period. Throughout that time, Japanese people were simultaneously defining themselves in relation to imperialist Westerners and peoples subject to Japanese colonization: Koreans, Chinese, and various indigenous Others.
Scholars have given us close-up views of how this conflict played out in specific cases. But what can an expanded map—built with the aid of digital archives and computational tools—add to our understanding of this space?
This is where language models, or at least one element of them, become useful. It is important to recognize that how LMs see race is a function of both the social and the technical. Technically, their skewed vision stems from the contextual word embeddings at the heart of their machinery. These embeddings quantify the meaning of words by the company they keep, treating each word as a kind of probabilistic function of the words that tend to appear around it. Once an embedding has been learned from a collection of texts, it can then be projected into mathematical space, such that words sharing similar company (i.e., with stronger associations) appear closer together.
The word “king,” for instance, if used to indicate a sovereign, will cluster near “monarch,” “ruler,” and related terms. This is because, more often than not, these words occupy the same lexical company. Higher-order associations are also captured, such that words for “king” hover closer to “man” and “he,” in this space, than to “woman” and “she.” This is because, well, kings have historically been men.
It is here that the social enters the equation. Embeddings learn the normative associations of whatever body of texts they train on. Texts can overrepresent certain majority voices (i.e., white, male), as is often the case with texts culled from online platforms like Reddit. From such texts, the models learn the dominant discriminatory biases encoded therein.
This leads to situations where LMs have trouble, for example, generating text about Muslims that has nothing to do “with violence … or being killed,” as one machine-learning PhD student has pointed out. The models become, in short, stellar students of stereotype, because they fixate on the regularly repeated associations made between a category of people and a specific vocabulary. Beneath this vocabulary, as Edward Said once proposed, is a set of representative tropes, which are to actual reality what “stylized costumes are to characters in a play.”
Race in the machine is always going to be a function of what we feed it.
Leveraging the technical affordances of word embeddings, I have tried reconstructing some of the “costumes,” or patterns of representation, consistently applied to racialized Others. I do so by examining a corpus of two thousand works of Japanese-language fiction and nine thousand articles taken from newspapers and general-interest magazines.
First, I extract clusters of semantically related words from each corpus. This means that words such as “wear,” “shoes,” “hat,” and “overcoat” are grouped together as “Clothing.” I then search for these clusters where they appear around a long list of racial and ethnic markers, which I provisionally group under supercategories like “Japanese,” “Chinese,” and “Native.” And I identify cases when words from a particular cluster appear more often around one set of markers (e.g., terms for “Chinese”) than another (e.g., terms for “Japanese”). I am especially interested in clusters that disproportionately gravitate around references to specific Others, as compared with references to “Japanese.” The results of this process are visualized here as a kind of heat map, or semantic grid.
These grids, constrained as they are by my limited sample of the textual record and by the computational methods I use, offer just one possible perspective on the Japanese racial imaginary under empire. Still, there is much to read in the patterns on display.
At the broadest level, we can read for points of overlap between the two grids (e.g., facial features and Westerner). These suggest stereotypical associations that persist across different discursive domains.
Within each domain, we can proceed to read relationally. This requires looking for points of convergence and divergence in the semantic clusters associated with each group more often than with “Japanese” (e.g., in the fiction corpus, “Chinese” and “Native” are disproportionately associated with words signifying nature and landscape). This also means reading the grid’s blank spaces, which hint at the tropes that go unspoken with respect to some groups or are otherwise missed by my search method (e.g., the glaring lack of clusters associated with “Korean”).
Finally, each shaded cell is an invitation to return to the texts and read the repetitions that accumulate to form the association or stereotype in question. When we zoom into the cell that binds voice-related words to “Native,” for instance, we uncover dozens of passages in works by canonical authors, including Natsume Sōseki and Sakaguchi Angō. But we also find them in lesser-known works of pulp adventure and detective fiction. And yet, together, these different genres project a collective image of the “Native” voice as an incomprehensible moaning, crying, or screaming: a version of what Sianne Ngai identifies as the excessively “lively” or “agitated” ethnic subject.4
With each reading strategy, we can leverage the semantic grid as a background tapestry, against which to explore how the stylized costumes of stereotype circulate and constrict the lives and bodies subject to them. To the extent that today’s large language models learn this tapestry—as woven by the voices and bots that dominate our information environments—there is merit to exploring them more deliberately via such strategies.
When we use these models to read the past, we can validate their results against qualitative accounts of interracial conflict. And in such investigations of the past, we can perhaps provide methodological templates for how to use these models to read the present.
There is an obvious lesson from such investigations. Race in the machine, as Gebru and others working at the intersection of ethics and artificial intelligence know well, is always going to be a function of what we feed it.
One feels this acutely when feeding the machine from digital collections that are a minor slice of an already porous and fragmented print archive, like that of Japan’s imperial era. We must continue to diversify these collections, so as to widen their scope of representation and counter their most extreme biases. Even so, as long as there are inequities and power imbalances between social groups, and as long as these materialize in language as essentialist and categorical statements, it will be hard to escape the prison-house of biased language models.
Yet, there are ways to think with and beyond the biases LMs are good at detecting. In my own work, I loosen the semantic grids they help create by turning to ideas about character.
Literary critics understand character as an evolving set of interpretative practices and values (e.g., a preference for round versus flat characters; seeing character as a function of internal versus external traits). Character is also a product of an unequal distribution of attention: whereas a select few characters (usually members of the dominant social groups) are psychologically rounded out, a vast many remain lightly drawn (usually members of minority groups). Using my semantic grids as a window onto the political economy of character, I track vocabularies of racial bias as they diffuse through the uneven character spaces of Japanese fiction. I look for when individuated characters, particularly those drawn by colonial subjects, feel these vocabularies intensely: both as suffocating costume and as something they desperately want to rip to shreds.
In this way, the patterns of bias detected with word embeddings are a means to explore where they circulate or break down. What if LMs could perform a similar role for cultural analysis today? Social media platforms are arguably revitalizing older economies of character. In 18th-century England, as Deirde Lynch explains, rapid economic and imperial expansion encouraged the sorting of people by type via external, superficial features.5 Now algorithms sort us by our digital traces and online personas, these virtual selves further reified by the platforms on which they’re performed and fed back to us.
LMs might teach us something about this new economy of character by helping investigate the kinds of racial and other social sorting it produces. We could use LMs to see how biases change across different collections of online text, searching for points of divergence from all-too-familiar stereotypes. We might also redirect them toward moments where biased language breaks down; or look to when such language reassembles as other conventionalized vocabularies, which seem unbiased only to those privileged enough to see and be seen by them.
The necessary critique of LMs and their algorithmic biases is thus also a learning opportunity. To confront these biases is to contend with the fact that they see racial and other imaginaries in ways familiar to us, itself an indication of how reliant these imaginaries have always been on the slow accretion of semantic associations. Used critically and in deliberate dialogue with other ways of knowing, bias in the machine can teach us as much about machines as about ourselves.
This article was commissioned by Richard Jean So.
- Ruha Benjamin, Race after Technology: Abolitionist Tools for the New Jim Code (Polity, 2019), p. 8. ↩
- See Mark Algee-Hewitt, J. D. Porter, and Hannah Walser, “Representing Race and Ethnicity in American Fiction, 1789–1920,” Journal of Cultural Analytics, December 18, 2020; Richard Jean So, Redlining Culture: A Data History of Racial Inequality and Postwar Fiction (Columbia University Press, 2020); and Sandeep Soni, Lauren F. Klein, and Jacob Eisenstein, “Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers,” Journal of Cultural Analytics, January 18, 2021. A recent example from sociology is Laura K. Nelson, “Leveraging the Alignment between Machine Learning and Intersectionality: Using Word Embeddings to Measure Intersectional Experiences of the Nineteenth Century U.S. South,” Poetics, March 10, 2021. ↩
- Mustafa Emirbayer and Matthew Desmond, The Racial Order (University of Chicago Press, 2015), p. 42. ↩
- Sianne Ngai, Ugly Feelings (Harvard University Press, 2005), p. 93. ↩
- Deidre Lynch, The Economy of Character: Novels, Market Culture, and the Business of Inner Meaning (University of Chicago Press, 1998). ↩