Humanities: Let the Hypothesis Testing Begin

The humanities have a replication crisis of monumental proportions: so many theories have never been adequately tested or validated.

Creative writers who publish in more “minor” languages (those less commonly spoken) are “condemned”—in the words of French literary critic Pascale Casanova—to defend and illustrate national history and controversies. But their counterparts, in cultural capitals like Paris or New York City, are free to explore more universal themes, whether of human existence or the nature of literature. Language, for Casanova, is literary destiny.

We now know that Casanova’s groundbreaking and controversial thesis in The World Republic of Letters was, in one respect, almost certainly wrong. On the contrary: writers in less widely spoken languages are significantly less likely to talk about national history and themes than those who work in more widely spoken languages, at least when it comes to writers whose work circulates within international literary markets. In order to achieve recognition beyond one’s borders, so-called minor-language writers are the ones who tend to gravitate to more universal themes. It is Parisian writers, meanwhile, who merrily go on talking about France and all things French.

But in another respect, Casanova’s thesis was also presciently correct. Different stylistic pressures do appear to be exerted on writers working within international contexts, depending on the cultural capital of the language they are writing in. The World Republic of Letters is not a level playing field. It just turns out that the effects of this imbalance move in the opposite direction than Casanova theorized.

How do we know this? How can we say with such confidence that Casanova was wrong on one count and right on another? Because we tested her claims.


Opening the Anthropocene Archives

By James Lee

Working with a team from the Humanities Digital Workshop at Washington University in St. Louis and .txtlab at McGill University in Montreal, we collected data on the writings of 189 writers across 22 languages. These were divided between “major” languages like English, German, and French, and less widely spoken languages like Icelandic, Romanian, Estonian, and Catalan. We selected only prize-winning novels that had been translated into multiple languages, i.e., that had achieved recognition within the so-called World Republic of Letters. We then built models using natural-language processing to approximate Casanova’s idea of “national themes” within our novels in three separate ways and compared the prevalence of these indicators across the different language groups.

Writers in the minor-language category, as we show in our paper, consistently exhibit lower degrees of national preoccupation than those in any of our major-language groups (Fig. 1). The effects of this are not particularly strong, and include a good deal of within-group variability. Still, we can be confident that, according to our models, writers from different language communities behave differently with respect to the explicit invocation of nationalist topics. And they do so in a direction opposite to that originally hypothesized by Casanova.

Fig. 1 Nationalism quotient at the document level by language community, with outliers removed

The process we’ve just described is conventionally known as hypothesis testing. Researchers start with a theory to explain something about the world and from this theory derive specific hypotheses about testable relationships that exist in a large population. Sample data is gathered to represent that larger population, and a statistical model is used to evaluate the hypothesis by mapping it onto variables in the data. For example, a hypothesis about the effect of some treatment (getting a vaccine rather than a placebo) on an outcome (hospitalization for COVID-19) is tested on a random sample of a population to evaluate whether a particular vaccine affects the incidence of COVID-19.

Hypothesis testing is used every day around the world, across numerous knowledge domains. It is one of the most ubiquitous ways to test the validity of our beliefs about how things in the world behave. In the natural and social sciences, it has also been the subject of intense criticism (see “the crisis of replicability”).

It may therefore seem surprising to write a piece exhorting the value of such methods for the humanities right now. You want to prescribe me that?

It is worth pointing out that, in response to concerns about the very real problems of replication today, scientists do not suggest we should abandon quantitative methods. Instead, they offer an array of suggestions to improve these methods. We didn’t test the efficacy of vaccines by going with our gut. The idea of replication behind the replication crisis—that results aren’t valid until they have been observed under new conditions by different observers—implies the need for more testing, not less.

So why do we think hypothesis testing is good for the humanities right now? Isn’t this a case of a misguided obsession with “scientific rigor,” which has long characterized computational work in the humanities: an obsession based on a failure to appreciate the unique character of literary and cultural artifacts or the limitations of quantitative methods?

On the contrary—we believe that a turn toward hypothesis testing will help us become more aware of exactly what we are doing and why we are doing it. Testing will help create the conditions for a much broader reception of the key insights of humanities scholarship.

Hypothesis testing is valuable because it pushes us toward consensus-based forms of knowledge.

Let’s take our paper described above as an example. Readers will notice that we point out a host of limitations in our study. Our observations, like Casanova’s, are all drawn from a European context. We cannot speak to how these effects might play out on a broader global stage. While we spend a considerable amount of time explaining and validating our modeling of “national themes,” there is no right way to do this. Different models may behave differently. Finally, our samples may contain hidden biases, such that different samples may behave differently.

That’s a lot of limitations, you’ll say. We couldn’t agree more. But rather than see these as a fatal flaw, we believe they help show why hypothesis testing provides such a valuable framework for analysis.

First, the reason we know about these limitations in the first place is because we make our modeling choices as explicit as possible. Readers of Casanova’s study have no idea of the criteria driving her choices of books or the principles that guide how she interprets each of her examples, all of which, not surprisingly, only ever support her thesis.

Second, hypothesis testing helps foreground what we don’t know as much as, if not more than, what we do know. In foregrounding the conditions under which our findings are valid, we draw attention to the limits of our certainty. In distinction to Casanova’s universalizing vocabulary—minor-language writers are all “condemned” to always behave in a particular way—the quantitative reasoning behind hypothesis testing presupposes a more probabilistic view of the world. Writers from our minor-language regions are more likely to engage in a particular type of behavior under certain conditions, but there are plenty of exceptions.

And finally, hypothesis testing is valuable because it pushes us toward consensus-based forms of knowledge. Because no one study can ever control for all conditions under which a belief may be valid, hypothesis testing provides a way for comparing results obtained by different sets of researchers. The point of doing hypothesis testing is not for you to take our word for it. Rather, the goal is to provide a framework so that you can further evaluate the conditions of our tests and see if you arrive at approximately the same belief, and if not, to indicate the conditions under which we disagree.

We fully expect to encounter strong resistance to these ideas. Resistance is after all the humanities’ stock-in-trade. Perhaps the best response is to point out that humanists have always been in the business of proposing hypotheses, without, however, reaping the benefits that come from rigorously testing them.

For example, some might be inclined to suggest that hypothesis testing isn’t an appropriate method, because humanists do not make general claims about the world that can be valid independently of the observer who makes them (itself a general claim). This can be termed “the particularism hypothesis”—the belief that the humanities do not study general behaviors but particular instances of things. But as recent work in our lab in Montreal has shown, using a sample of over 15,000 statements drawn from over 200 research articles, we found that researchers in disciplines like literary studies or history make generalizing statements almost as often as—and, in some journals, more often than—researchers in quantitative disciplines like sociology. The belief that humanists don’t generalize about the world is not one that is well supported by the data. Hypothesis testing gives us another way (and, it cannot be stated enough, not the only way) to try to validate our arguments.

Others might insist that our objects of study do not lend themselves to quantifiable measurements of the sort that can be subject to replication. To be sure, modeling “nationalism” or “generalization” in language is complicated, but this is something all disciplines wrestle with. “Wealth” isn’t a straightforward concept, nor is “psychological well-being.” Dramatic improvements in language modeling over the years mean we can create increasingly sophisticated models of linguistic behavior. This is exciting, and in our view to be welcomed. Indeed, because humanistic concepts are so challenging to model, successful efforts in this regard are likely to encourage more sophisticated approaches in other fields. We have the opportunity to take the lead in efforts to computationally model language and communication.

So why do we need to bother with all of this? Because humanists ask important questions—indeed, some of the most foundational questions concerning human existence: How can we understand the factors that determine the cross-cultural circulation of ideas? What role does storytelling play in cognitive development, problem-solving, group coherence, personal-identity formation, or political change? To what extent do the social biases that pervade literary and cultural representations help perpetuate social inequality and injustice?

Without more explicit and transparent efforts to validate our answers to these questions, however, the impact of our research will remain limited. Hypothesis testing gives us the means of providing credible, consensus-driven answers that speak the lingua franca of today’s knowledge environment.

Let’s be honest. While it is tempting to think of the replication crisis as somebody else’s problem, the humanities have a replication crisis of monumental proportions. We have so many theories that have never been adequately tested or validated. Casanova’s book offers just one example. Some will stand the test of time; some, like Casanova’s, won’t. New theories will be developed, and old theories will be adapted. Whatever the case, our field will benefit.

Our hypothesis is that this very simple, very unrevolutionary idea of the hypothesis test will be one of the most important, discipline-changing, research-altering concepts for the future of the humanities. Of course, this remains to be proven.


This article was commissioned by Richard Jean Soicon

Featured image: Detail of Scientist counting chromosomes. Photograph courtesy of National Institutes of Health / Flickr (CC BY-NC 2.0)