Galaxy Brain

Teaching is a collective project. What happens when we see it that way?

I remember learning in college that Max Weber was the last person who knew everything. This wasn’t meant literally, though he had a prodigious grasp of science, politics, and culture. Rather, it referred to his thinking about how knowledge and scientific inquiry are organized, and how we relate to them. Weber concluded that in the modern era, the fragmentation of knowledge into specialties and professions made a complete or unified view of knowledge impossible. We work in our little corners, and know that others are working in theirs, and that relationship to knowledge is, Weber argued, disenchanting.

Despite its name, the university doesn’t solve this problem so much as administer it: pushing students through years of training toward specializations. The intangible product of these experiences are fields. We work in them and are defined professionally by them, but it’s hard to see them. And the university multiplies this experience across dozens or hundreds of them.

A neat trick in the age of big data is to gather some subset of the experiences that define fields—say, the teaching choices recorded on millions of syllabi—and bring their connections into view. This is what the Open Syllabus “Galaxy” does. It’s a navigable plot of the million most frequently assigned titles across all fields, clustered by the extent to which they are assigned together. And it’s the closest thing available to a bird’s-eye view of anglophone higher education.

The Galaxy attracts a lot of oohs and aahs. We could say that it enchants in both the normal and Weberian senses—providing a whiff of the old unity of knowledge. And most people stop there, unsure how to make sense of it or use it. After all, there are no “Galaxy-brain” users—no institutions or agents that work at this level of abstraction.

But we can try to take a few steps in that direction. And doing so will take us into the weeds, so to speak, of plots and fields.


What Was Communications?

In the mid-2000s, I ran a program at the Social Science Research Council (SSRC) that provided a home for conversations about the field of communications. Traditionally, communications in the US had been mostly “mass communication”: a cluster of professional tracks that prepared students for careers in broadcasting, journalism, and public relations. That began to change in the 1990s, led by programs with disciplinary ambitions derived from the core social sciences. Over the next two decades, mass communication became a much more scholarly field. The number of PhD programs grew from three in 1992 to 23 in 2017 and mass was generally dropped from program names, leaving communications or, depending on the institutional lineage, media studies, as the most common descriptors.

Our work was aligned with efforts to promote more policy research, but this was only one position in a larger conversation about field identity. Should communications be more theoretical or more applied? Did the field have distinctive methodologies or was it primarily defined by its subject matter? Should it be the home for emerging internet studies? Was it the new liberal arts? Because communications was still receptive to (and largely led by) refugees from other social sciences, it was only weakly disciplinary in the sense of shaping and policing methods and directions of inquiry. Questions of identity and priorities remained open to a much greater extent than in sociology, political science, and other adjacent fields with longer histories of consolidation.

As a relative newcomer to this conversation with a degree in literature, I was struck by the gap between the normative focus of the SSRC conversations and the comparative lack of descriptive accounts of the field. Everyone had ideas about what the field should be. It was less clear to me that these proceeded from much substantive agreement about what it was.

In 2008, I proposed to a group of communications program chairs that we might be able to use course content as the basis for a more descriptive account—one that could, among other things, provide a baseline for the normative agendas under discussion. Communications, from this perspective, could be understood pragmatically as the body of knowledge that its faculty chose to transmit through teaching.

Teaching had the notable advantage that everyone did it—the scholarly programs, the professional programs, programs focused primarily on old or new media, two-year and research-intensive schools. In this respect it provided a better basis for understanding the activity of the field than, for example, the forms of research journal–based citation analysis that had become common. Course syllabi were the obvious raw material for this work, but collecting and analyzing them across even a small range of programs proved laborious. My position at the SSRC had little scope for unfunded research, so I didn’t pursue it.

But I also didn’t forget it. In 2014, now at a public policy institute affiliated with Columbia University, I cofounded a research project called Open Syllabus. With more syllabi posted online in large numbers, we were able to solve the problem of collecting at scale. Because cloud computing and machine learning had become more accessible and affordable, we were able to analyze syllabi computationally in ways that, a few years before, would have required punitive amounts of manual labeling and coding. We now work with a corpus of around 21 million syllabi (and growing).

Much of our early work focused on how to extract assigned titles, course years, fields, and institutional locations from syllabi. These four elements were enough to organize a top-down account of the curriculum based on counting the appearances of titles across fields and schools, and over time. By 2016, we were able to turn this data into searchable lists and rankings (you can explore the current version in Open Syllabus Analytics.) By 2021, we were able to turn it into a massive plot: the Co-Assignment Galaxy.


How to Read a Galaxy

If you want to follow the next sections, there’s no substitute for spending a couple minutes exploring the Galaxy website. If you want a more technical account of how it was made, here’s a post by its creator, David McClure. If you’re ready to move on, let’s look at a screenshot of the portion that covers most of the social sciences and humanities, with political science in purple in the middle, history below it in orange, literature in teal at the bottom, and economics in green to the right.

To make sense of this picture we need to understand a few things about its composition.

First, every assigned title is represented as a dot. The bigger the dot, the more often it is assigned.

Second, titles cluster together based on how often they are assigned together on the same syllabi. The more “co-assignments,” the tighter the clustering.

Third, titles are colored based on their predominant field of assignment. An orange dot, for example, is a title that is assigned at least half the time in history classes.

Fourth, when a title has no predominant field of assignment, it is colored gray. Gray titles are generally interdisciplinary—taught across multiple fields.

These four principles really reduce to two, governing layout and color. And these two incorporate different ways of thinking about fields—respectively, content-based and institution-based. By content-based I mean that the layout is derived solely from similarities in the assigned contents of millions of classes, with no prior knowledge about how those classes divide into sociology or history or physics (we added the large text labels later for convenience). By institution-based, I mean that the Galaxy also incorporates a taxonomy of common university choices about how to divide up human knowledge—into, say, biology and chemistry departments, but rarely biochemistry departments.

What can we do with these principles? Let’s return to communications. Zooming in on the area where communications titles cluster (yes, it’s labeled Media Studies, sorry), we can see that titles are split into two large zones, roughly corresponding to “old” and “new” media. Journalism, broadcast technologies, and the cultural studies traditions derived from them are on the left. New media and various forms of internet studies are on the right.

Next, the distribution of color. Titles assigned predominantly in communications are orange. Journalism appears in red. But titles without a primary field of assignment are gray. The predominance of orange on the left suggests that the field of communications dominated the study of older media. The predominance of gray on the right suggests that the study of internet and new media phenomena is much more interdisciplinary. For most of the gray texts in this zone, communications is the largest field of assignment but not the dominant one. It accounts for around a quarter of the assignments of Lev Manovich’s The Language of New Media; a quarter of Sherry Turkle’s Life on the Screen; and somewhat fewer for Yochai Benkler’s The Wealth of Networks—to cite three frequently assigned titles by authors who, incidentally, are not based in communications programs. The titles cluster here, rather than in the other fields to which they contribute, because this is where they are most often assigned together.

Like all plots of this kind, the Galaxy is an idiosyncratic representation of a complex dataset. Its layout is sensitive to parameter settings and compositional choices. One source of sensitivity is the choice of one layout over another. In our case, we tried to balance the relationship between density and legibility in the layout. Turn up the gravity (a.k.a., the edge weighting) and the plot collapses into a dense blob. Lower it too much and the clusters become diffuse clouds. These versions are all equally “true,” but not—to human perception—equally interpretable.

Another source of variation is the mathematical process of “dimensionality reduction,” which enables a dataset with millions of combinations of co-assignments to be represented two dimensionally. This process is somewhat analogous to flattening a globe and it introduces similar distortions. The distance between points or clusters in the flattened view isn’t necessarily meaningful. And it’s possible for groups of points that are far apart in higher-dimensional representations of the data to get sandwiched on top of one another when squeezed down to two.

Because of these factors, there is a lively debate in data science about how far one can go in interpreting these plots. Much of this conversation has played out at the intersection of data science and computational biology, where the use of dimensionality reduction data has become common and where the resulting representations can be tested in limited ways against biological “ground truth” derived from other sources. Methods have also emerged to test for zones of high and low distortion within the plots themselves.

The Galaxy, in other words, is a product of multiple layers of interpretation and reduction. And all further interpretation builds on this wobbly foundation. But these problems are also common to the interpretation of visual media in general—to paintings, photographs, diagrams, and other forms of visualization that mediate their subjects. Any interpretation of the Galaxy is built on—and also tested against—other forms of knowledge about the same objects. In genetics, this might be another dataset that provides independent information about gene location. For academic fields, which have no real-world spatial dimension or even agreed upon definition, the subjectivity of interpretation is closer to the surface: Do interpretations of the plot make sense in light of what we know from other sources, such as studies of fields or our experience with them? Can those interpretations be generalized or extended beyond our priors?


Field Grammar

The Galaxy has a narrow range of compositional elements—a kind of grammar that describes the ways in which layout and color convey information about fields. Earlier, I noted that emergent fields show a lot of gray because they tend to be dominated by texts that are assigned across multiple disciplines. “Gray clusters” are part of that compositional grammar. New media studies is predominantly a gray cluster. Environmental studies (below, and Galaxy link) is another—visually capturing the consolidation of interdisciplinary programs in this area over the past 20 years. And it’s possible to see how the field remains a composite of different subject areas and disciplines. History titles (the orange dots) and English titles (teal dots) are the most visible contributors to a section in the north that deals mostly with environmental history and ethics. Political science (purple) is the main contributor to a section to the south focused on environmental politics and social movements. Animal studies and food studies are nearby but distinct.

What if we zoom out? Energy studies is just off screen to the south; ecology is mostly contained within biology to the east; urban studies is to the northwest; geography is to the southeast. We could say that the Galaxy describes environmental studies as a multidisciplinary field that explores human interaction with the physical environment and biological systems.

What, in contrast, does a well-consolidated field look like? In the Galaxy, it’s one in which the content-based and institution-based approaches strongly coincide. English is a good example: its huge corpus of titles is predominantly assigned only on English course syllabi, which is reflected in the consistent teal coloring across the major teaching subfields. In contrast, there is very little gray in the English clusters and comparatively few significant zones of contact with other fields. What does interdisciplinarity look like in the English curriculum? The Galaxy suggests two main varieties: First, contact zones between national literatures and national histories (the latter show up in orange). See, for example, the cases of Australia (Galaxy Link) and Ireland (Galaxy Link). Second, a number of gray interdisciplinary fields where English is an important but secondary contributor. Film (Galaxy Link) and gender studies (Galaxy Link) are the most prominent examples.

Big, well-consolidated disciplines often resemble archipelagos that spread out along their major teaching subdivisions. This subfield geography in English is distinctive and reflects the continued periodization of teaching by century, country, and—to a lesser extent—genre.

The disciplinary gravity that holds these clusters together, though, can be fragile. In the real world, classes that tick more than one disciplinary box can be cross listed. In the Galaxy this tension has to be resolved through color and spatial location. Most fields have inflection points where disciplinary commonalities can be outcompeted by strong thematic or problem-centered curricula, which “pull” titles into their orbits. Such tensions between disciplinary and topical organizations of teaching and research have been the starting point for many new fields over the years, such as the area studies fields that emerged in the postwar decades in part through SSRC programs.

Area studies fields turn out to be strong attractors of disciplinary titles. There is, for example, a large semicircular cluster for Asian and South Asian studies, composed of an arc running from India in the southeast to China and Japan in the northwest (Galaxy link). What’s the main thematic connector between South Asia and East Asia? The study of Hinduism and Buddhism, here in teal and gray. There is a corresponding regional Africa cluster with differentiated political science, history, and literature subclusters (Galaxy link) and a similar one for Latin America (Galaxy Link). And there is a more nebulous cluster covering the Middle East, North Africa, and the study of Islam—with the last contributing a strong interdisciplinary dimension to the field (Galaxy link).

Perhaps the clearest example of an area studies field is Eastern Europe, which is in practice dominated (still) by the study of Russia (Galaxy link). Russian studies is composed of connected but well-differentiated “stripes” of assigned titles sourced from its major contributing disciplines: teal for literature, orange for history, purple for political science, and green for a small body of work in economics. Why does Russian studies have stripes while a topical field like environmental studies is more of a gray blob? Our field grammar would attribute it to differences in organization in which historical, political, and literary titles don’t mix often enough in the curriculum to produce gray circles. Russian studies is, in this reading, a strong topical field but not a very interdisciplinary one. It has a lot of curricular autonomy, consistent with decades of Cold War investment, but also a long “tail” that influences the teaching of eastern Europe and adjacent themes. Curricular patterns established decades ago remain well organized into the present.

Let’s look at a field with a peculiar shape. Across several versions of the Galaxy, sociology has been a centrally located field pulled in various directions by other fields. In this version, sociology’s main body of titles is laid out along two axes: a horizontal axis that traces a path through the main traditions of social theory, and a diagonal axis, running roughly northwest to southeast, organized around issues of class, family, and gender (Galaxy link). At its edges, the core gives way in all directions to interdisciplinary contact with other fields. Sociological theory blends into political theory and political philosophy to the northeast; socioeconomic literature leads toward social work to the northwest; studies of family lead south toward a gender studies cluster. Other major sociology subfields get pulled, in this version of the Galaxy, toward other topical attractors. Sociological work on cities, for example, is found mostly in urban studies, a highly interdisciplinary (gray) field connected to architecture (Galaxy link). Work on crime and deviance is similarly interdisciplinary and closest to the curriculum in criminal justice—a career-oriented field that integrates sociological literature (Galaxy link). Work on race has no cluster of its own but is diffused throughout the sociological archipelago. More than most fields, sociology straddles these different field logics: core and periphery, contact zones with other fields, and the pull of topical attractors that capture sociological subfields.

A reasonable criticism of this approach to interdisciplinarity is that it depends heavily on our starting assumptions. Environmental studies, for example, isn’t in our initial taxonomy of 62 fields, which means that its cluster is populated, necessarily, by titles linked to syllabi assigned to other fields. Have we shortchanged environmental studies?

We can explore this indirectly by looking at women’s studies, which we do classify as a distinct field. Women’s studies has similarities to environmental studies insofar as it is a topical field that draws on multiple disciplines. It is also a relatively recent field—though with a somewhat longer history of institutionalization dating back to the 1970s. For our purposes, women’s studies made the initial cut not because of a stronger inherent claim to field status, but because women’s studies syllabi showed up often enough and distinctively enough in our initial syllabus labeling to produce a reliable classifier. The field taxonomy that emerged from that process has obvious limitations but it isn’t capricious—it can be mapped back to the Department of Education’s 2010 CIP (Classification of Instructional Programs) codes, which is itself a glorious mess that distinguishes around twenty-seven hundred fields. It’s hard to blame the DoE for these category problems. University approaches to classifying knowledge are diverse and constantly changing.

What makes women’s studies interesting, in this context, is that the women’s studies cluster is mostly gray despite having an available dedicated color. In practice, nearly all of the titles associated with the cluster are assigned across a range of fields—with sociology prominent and exercising the strongest gravitational pull, but with important contributions from other fields. English, philosophy, history, and political science are the top-ranked fields of assignment for Simone de Beauvoir’s The Second Sex. English, sociology, political science, and women’s studies for Judith Butler’s Gender Trouble. Sociology and women’s studies for Judith Lorber’s Social Construction of Gender. Women’s studies is the dominant field for some titles, but these are far down the overall title ranks and are mostly associated with the history of feminism and reproductive rights. Estelle Freedman’s No Turning Back, Barbara Findlen’s Listen Up, and Margaret Sanger’s My Fight for Birth Control are examples.

Small fields are not doomed to invisibility in this model. Here, for example, is the Republic of Dentistry (Galaxy Link). And the only slightly more connected cluster for dance (Galaxy Link). Fields that share few titles with their neighbors become islands, such as Fitness Island, which pulls together physical education, coaching, and physical therapy.

Field Brain

It is possible, of course, to create maps that reflect these lower levels of organization. The Open Syllabus dataset can be cropped down to focus on individual fields and smaller numbers of titles. Such restrictions produce more familiar intellectual landscapes organized around teaching canons. Here, for example, are the top six hundred or so titles assigned in sociology since 2015.

This map—a print available at the Open Syllabus Print Store—comes closer to student experience of the sociology curriculum. The main themes and divisions are all there: methods texts are in green on the upper left; different flavors of social theory are spread across the top center and right; gender is at the bottom right; work and labor are in the middle; and various approaches to race, ethnicity, and class occupy the lower left quadrant. But the other fields that pull and fragment sociology in the Galaxy are removed. It’s field brain, not galaxy brain. Here’s what that looks like for two more fields, classics and architecture, which from the Galaxy perspective are heavily interdisciplinary.

As I said at the beginning, we expected the Galaxy to generate discussions about fields. Since academics are, by definition, trained into fields, there’s a point of entry for everyone. But it didn’t, and this piece is in the odd position of trying to explain why the dog didn’t bark.

Part of the answer is that habits of interpretation for these types of plots are underdeveloped, which is not surprising for a new visual modality. Another may be that the conditions of interpretation—how one begins to have confidence in an interpretation or in the plot more generally—are too opaque for casual engagement, with too much required contextual knowledge about the plot and the data for a visitor to venture interpretations about what they see. And if this is true of the Galaxy, where all academics have at least some “ground truth” to draw on, it is almost certainly true of the larger category of large-scale visualizations to which this belongs. It may be that these plots work less as general objects to think with than as the scaffolding for guided stories through the data, like those I have tried to tell here. Or it may be possible to provide a set of loose rules to guide interpretation.

This piece has tried to do some of both, because Weber’s problems are still with us, and because new forms of representation always require new forms of literacy.

There is more to say of course. I haven’t made much reference to the dataset, our collection methods, the numerous catalogs and other taxonomies on which we draw, or the other infrastructure behind the work. Caveats about all of those could fill another article. The map, as they say, is not the territory. The important question is whether it’s close enough to it to learn from. I’d argue that it is, but you live in the Galaxy too, and should decide if you feel at home.

You can read other Open Syllabus data stories here or follow on Twitter. If you want to support Open Syllabus’ work, send us your syllabi, buy a poster, or help us get your institution involved. We have ways to make that interesting. icon

This article was commissioned by Nicholas Dames.

Featured image: The Open Syllabus Galaxy.