Genre Juggernaut: Measuring “Romance”

For its scale and internal complexity alone, the literary genre of “romance” warrants more study than it has received.

We are not the cultural consumers we used to be. Data, streaming, and Web 2.0 have remade how we read and how we watch. Platforms are the new publishers. But although we consume culture differently now, much of how we talk about and study it remains lodged in the analog world of the 20th century. It’s time for our methods to catch up with our objects. Born digital culture requires a born digital approach. Hacking the Culture Industries showcases the power of data-driven cultural criticism, and reinvigorates cultural studies for the 21st century. These five new essays move between book culture, streaming TV, social media, and online writing platforms: Squid Game and streaming hits; Goodreads and romance fiction; Twitter and hive-critique; Tik Tok and cultural attention; and who gets to decide who wins book prizes in the age of social networks. This series takes up a call we issued a year ago: to hack the culture industries. To challenge their dominance by using their data to study them and their stranglehold on cultural production. To tell new stories about culture in a time of ubiquitous data.

—Laura B. McGrath, Dan Sinykin, and Richard Jean So

Late this past summer, The Ripped Bodice, a dedicated romance bookstore in Culver City, Los Angeles, opened its Brooklyn location, and fans of the genre swarmed in as if for a Taylor Swift concert. Braving 90-degree heat in Park Slope, a diverse mix of mostly millennial readers formed a line all the way down to the corner just to get into the shop. When preparations began for a book signing by bestselling nonbinary romance author Casey McQuiston, readers bearing copies of McQuiston’s books created an even longer line, reaching halfway around the block an hour before the author arrived.

The immense interest in romance fiction and the diversity of authors and readers driving its current success have become increasingly apparent. As Melanie Walsh discussed in this series last year, the publishing industry keeps much of the most important and revealing data about which books people are reading “purposefully locked away, … basically inaccessible to anyone beyond the industry.”

But while the producers of books like to guard their secrets, readers are often willing to share. At the University of Pennsylvania’s Price Lab for Digital Humanitites—where our team studies contemporary tastes and habits of reading—we’ve been using the Goodreads social book-collection site to access data about books and reading from this more open side of the field. Among other things, the reception-side approach lets us classify books the way readers do themselves, rather than simply accepting the genre labels assigned by publishers or librarians. We’ve studied thousands of avid readers and the hundreds of thousands of books in their collections. And what we’ve learned is that romance is not just one literary genre among others.

Instead, romance is the juggernaut of contemporary literature, standing out from all other genres in its sheer scale and in the wild diversity of its subgenres. Scholars and teachers have long dismissed the genre as a narrow, hypernormative form of fiction catering to happiness addicts. But, in the world of the genre’s actual readers, romance is a vital part of the literary system: large, complex, and dynamic.

Why look to Goodreads for this kind of information? It is an ancient site, at least by social media standards. And, since its acquisition by Amazon a decade ago, Goodreads has managed to alienate even some loyal users with its cluttered format, creaky site architecture, obtrusive parent company advertising, and persistent vulnerability to bad actors abusing the review system to advance their own careers or trash the careers of others. Even so, its membership has kept growing, recently surpassing 100 million. It remains the world’s richest repository of self-reported information on reading: what people read month by month and year by year; how their tastes become broader or narrower over time; and how they respond as readers to new trends in publishing or to broader social and political developments like Black Lives Matter and the COVID-19 pandemic.

A defining feature of Goodreads is that it lets users organize their books into whatever groups, or “shelves,” they like. Their collective shelving preferences often differ significantly from industry labels. Our team gathered the user-generated shelving data for some 600,000 books, corresponding to the libraries of 3,200 highly active Goodreads users.

What jumps out immediately from this data is the enormous scale of romance. Users file books on their romance shelf nearly as often as they do on the shelf for fiction itself (and far more often than on that for nonfiction).

Table 1: The top six genre shelves on Goodreads, based on user-generated shelf data for 600,000 books.


These numbers count books as, say, fantasy, even if they only land on the fantasy shelves of a few readers who use the shelf feature in Goodreads. To focus on the books that readers associate most closely with a genre, we set a rule only to count books when that genre claims at least 10 percent of their top 10 shelf assignments. That may sound like a low bar, but it actually rules out all but the most strongly genre-related books. In the romance category, for example, Ian McEwan’s sweeping metafictional love story Atonement is excluded, since its romance shelving score is only 9%. Pride and Prejudice, the most canonical of all marriage-plot novels, achieves only 14% romance shelving.1 Even the purest or least hybrid romances one can think of—books like Emily Henry’s Beach Read or Jasmine Guillory’s The Wedding Date—are only shelved as about 50 percent romance.

Classifying all the books into genres based on this 10 percent rule, we still found that the romance category contains far more books than other leading genres: twice as many as fantasy, and three times as many as mystery. Nothing else comes close.

Table 2: The top six shelves on Goodreads, using the 10 percent filter described above.


Romance is not only the largest genre category but, according to our analysis, the most distinct and well-defined. We constructed a network based on the top 10 genre-shelf assignments of all our books, including everything from Australia and college to gothic, road trip, and football. We then ran a community-detection analysis, which helps us find shelves that tend to cluster together: for instance, college and football connect to each other more often than they connect to Australia. We used a computational tool called Louvain detection to look at all of these connections and cross-shelvings, studying each closely to see what sorts of shelves comprise the detected groups.

We can perform this analysis at different levels of granularity: zoom way out, and our network has just two groups, fiction and nonfiction. Zoom in just a bit, and the next subgroup to emerge is romance. If we keep going, we’ll find familiar communities like mystery or children’s, and at extremely detailed levels we’ll even find clusters like Amish fiction.

But before any of those emerge, we get that gigantic collection of romance shelves. In this respect, romance is the first kind of fiction we can separate from the rest.

And romance isn’t just easy to find; it also stands out as the major genre that most readily divides up into its own distinct subgenres. For any genre that interests us, we can build a network containing only books in that genre (using the 10 percent threshold we describe above) and run the same kind of community detection. If we use the same granularity (or “zoom” setting) for each genre and count the number of communities detected, we get a sense of each genre’s modularity. The network constructed with romance books turns out to be significantly more modular than all the other major genres. In fact, romance divides up into distinct types or groups nearly as well as the entire nonfiction network, with its well-established subcategories of self-help, cookbooks, biography and memoir, etc.


Virtual Roundtable on “Fifty Shades of Grey”

By Minou Arjomand et al.

You might think this is a function of scale: it’s easier to divide a big set of items into subgroups than a small one. But that’s not what’s happening with these literary genres. We’ve found small sets of books, such as those shelved as military fiction, that are significantly more modular than romance. (Apparently readers sharply distinguish Civil War novels from World War II novels and hold both distinct from Christian Crusades novels and space war novels, et cetera.)

What makes romance so unique is that it stands apart as a double-outlier: both much larger and much more modular, much busier internally, than any other major genre of fiction: mystery, fantasy, thriller, historical fiction, young adult, classics. Only the fiction and nonfiction groups, in their entirety, present such a combination of large size and elaborately subdivided internal structure.

This process of subdivision enabled us to break the romance genre space into nine distinct genre groups, as pictured in the graphic below.


What different kinds of romance fictions are captured in these nine groups? An algorithm can see statistical patterns in the co-occurrence of some genre shelves with others, but it has no capacity to describe or explain those patterns in literary terms. Even just naming the groups requires the thoughtful exercise of a human reader’s judgment.

What the algorithm does is give us a starting point, by identifying the largest shelf in each group (measured by the number of times readers used that shelf for books in our romance corpus). For some of the more straightforward genre clusters, the name of that shelf makes a good name for the group as a whole: historical romance (green), fantasy (pink), young adult (yellow), mystery (pale blue). But in other cases this naming strategy is problematic. What is distinctive about subsets of romance fiction called romance (navy) or fiction (purple)? Does it make sense to use m m romance (i.e., male/male) as the name for the red cluster, which contains books shelved as lesbian and genderfluid? Are Christmas books (in grey) really one of romance’s nine most distinct subgenres?

We don’t have space here to sort out all the questions of naming and labelling, let alone to provide detailed descriptions of the nine groups. We will just offer a few quick observations.


To begin with, we find that even the most straightforward seeming groups can be richly informative about what romance means to readers and how it circulates in the genre space of fiction. The relationship between the fantasy and mystery groups, for example, hints at complications that arise from the different kinds of criteria readers use to define different genres.

Fantasy has cross-pollinated heavily with romance. In fact, the term romantasy is now in wide use on Goodreads and other bookish social media platforms. The fantasy group in our network contains 103 shelves, including urban fantasy, paranormal romancewerewolvesdragons, and vampire hunters. This sprawling community is shaped by shelving data for books such as Deborah Harkness’s All Souls series of clever witch-and-vampire novels, and especially by such massively popular epic fantasy series as Sarah J. Maas’s A Court of Thorns and Roses.

The mystery group, by contrast, consists of just 33 co-occurring shelves, supported for the most part by romantic suspense novels like Colleen Hoover’s Verity and Nora Roberts’s The Obsession. Considering that mystery is one the powerhouse genres of contemporary publishing (by our count of Goodreads books, it is 75 percent as large as fantasy), this is a startlingly small area of overlap.

The stark differences in how romance relates to fantasy and mystery might be explained by the different principles on which the coherence of the genres depends: setting, in the case of fantasy, and plot, for romance and mystery.

It’s easy enough to set a courtship in a magical or extragalactic realm; but it is quite difficult to integrate it with a detective narrative. Even in cases where the sleuth becomes romantically entangled (as often happens in cozy mysteries), Goodreads users generally won’t shelve the books as romances. Dorothy L. Sayers’s classic Busman’s Honeymoon bears the explicit subtitle “A Love Story with Detective Interruptions,” yet only a tiny fraction of readers shelves it as romance.

The disjunction makes sense if you consider a case like Sue Grafton’s A is for Alibi. Here, private eye Kinsey Millhone seems to have a romantic relationship brewing with a handsome lawyer—until he tries to kill her to cover up his other murders. The pressures of the crime plot kill the mood as well, and readers shelve A is for Alibi as detective fiction rather than romance.

romance is the genre of genres, a veritable genre ecosystem in its own right.

But consider a couple of less straightforward groups on our map: the grey (Christmas) and red (m m romance) clusters. Both are examples of genre communities that can be hard to parse at first look, but become more legible and informative when interpreted by a knowledgeable romance reader. Some of us wondered if the Christmas group wasn’t an error, a quirk of its small scale. But to our romance domain expert, Angelina Eimannsberger, the presence of this cluster was no surprise. It represents a firmly established romance subgenre, with dedicated authors and marketing campaigns. If you ever wandered through the romance section of a big bookstore in December, you probably passed a holiday display featuring books like the latest seasonal bestsellers from grand dames of romance like Mary Kay Andrews and the Christian author Debbie Macomber, rom-com novels like Christina Lauren’s Groundhog Day–esque Christmas romance In a Holidaze, and perhaps a millennial tearjerker like the Reese’s Book Club selection One Day in December by Josie Silver. A more inclusive name for this genre community would be holiday romance. Christmas is the group’s dominant shelf in our network, but its centrality there is likely exaggerated due to the time lag in our dataset. We are analyzing collections that users built up over the whole span of their activity on Goodreads, typically dating back 10 or 15 years, and our data excludes books that were added since we scraped the site in late 2021. An analysis more focused on the most recent few years might reflect the emergence of queer and Jewish holiday romances by such authors as Alison Cochrun and Helena Greer, and a more general broadening of romance demographics to include not just those with sentimental investment in a Christian holiday but other readers with other kinds of investment in winter, seasonality, love, and celebration.

Similar observations apply more emphatically to the subgenre that our algorithm calls m m romance, which captures shelves connected to LGBTQ+ themes and stories. M m romance, in which two male heroes fall in love with each other, is the most established subgenre and, according to our data, by far the most connected shelf in the group. That reflects the reality of romance publishing over the last couple of decades, when nearly all the queer romance narratives on offer seemed to be novels about gay white men written by straight white women.

Those kinds of novels remain popular, but neither their authors nor their main characters reflect the sexual and cultural diversity of today’s readers. Here again, our dataset captures the recent past better than it does current trends, which have seen more queer women and nonbinary authors writing romance, and more main characters who are lesbian, bi, pansexual, and asexual. But even in our somewhat backward-leaning cluster, we find the shelves called queer, transgender, polyamory, bisexual, and asexual—shelves that can only expand with the success of LGBTQ+ authors like TJ Alexander, Alyssa Cole, Alexis Hall, Chencia Higgins, Casey McQuiston, and Fiona Zedde.

Of course, the majority of books that readers shelve on Goodreads as literary fiction, or fantasy, or mystery/thriller, or whatever, are not coshelved as romance. But readers perceive romance to bear deep affinity with many, many books on those shelves, far more than any of us expected when we began gathering this data.

Viewed from the reception side, romance is unrivalled in its extensivity: showing up as a significant presence in practically every corner of the literary genre space. And a corollary to that finding is that devoted romance readers—readers traditionally dismissed for “only” reading romance—can be as open and adventurous in their tastes as readers who proudly assert their eclecticism (but who have never even heard of Jasmine Guillory or Alexis Hall).

These findings should be a wake-up call to those of us in academe who are studying contemporary literature without attending in any serious way to romance. Far from being just one genre of fiction among others, romance is the genre of genres, a veritable genre ecosystem in its own right. In its scale and internal complexity alone, it warrants more study than it has received.

But romance matters also for the way it bears on the larger system. Cultural fields, as we know, are relational. In the case of literature, the most significant relation may be that between romance and everything else. icon

  1. This data is based on the shelf counts for a book’s 10 most common genre shelvings, as reported on the “old-style” Goodreads landing pages. That information disappeared with the introduction of a new page format in 2022. The revamped site does provide access to complete shelf counts, but to extract the top 10 genre shelves from that data would involve different methods than what we present here.
This article was commissioned by Laura B. McGrath, Dan Sinykin, and Richard Jean So. Featured image photograph by Jamie Street / Unsplash (CC by Unsplash License).