Some of today’s most provocative scientific tools are being built to do science themselves. IBM’s Watson, for instance, is being developed to sift through data at volumes far exceeding the capability of any one human researcher. While we tend to accept what our computers say about data in numerical or other easily computable form, the jury’s still out as to what our machines are making of text and speech. Natural language processing, or NLP—the work of programming machines to make sense of human language—might therefore be considered the toughest nut to crack under the big data paradigm, as much of the data in need of sifting is delivered in precisely that form. If NLP can formalize and mathematize semantics, so the story goes, computers will be able to usefully extract meaning from text data and, in time, help us make more essential, predictable, and universally applicable claims.1
The implication, of course, is that science (in its engineering and computational branches) is uniquely capable of harnessing natural language to serve in scientific claim-making. This ambition is nevertheless shadowed by a long history of scientific endeavor that has struggled to make its practitioners, and research outputs from myriad subfields, “exactly” understood, efficiently and quickly, across the world’s linguistic and geopolitical divides. Even in a technoscientific age, scientific claims still take shape messily, between people and publications, in presentations and personal correspondence. Despite these challenges, many scientists, and their readers in the general public, assume that science “speaks” clearly and directly in a metalanguage of mathematical expressions—even when it makes language itself its object, as it does with NLP.
Michael Gordin, historian of modern science, confronts this persistent blind spot by putting science itself under the microscope, examining scientific activity from the overlooked vantage point of linguistic, not laboratory, practice. At a time when scientists are proposing big data as the next lingua franca, Scientific Babel gains special importance, delivering a smart, rich, and refreshing language history to word nerds used to finding such work confined to literary, bibliographic, and philological spheres. Gordin shows us that scientific truth-seeking and truth-telling have never, and still don’t, only happen in measurements, calculations, and equations. They happen in language operations and, crucially, in translation.
The biggest surprise for 21st-century readers may come in learning that knowing (or at least muddling through) a handful of languages used to
be part of a scientist’s job.
Despite being central to any globalized domain of activity, translation tends to be practiced in the shadows; for those who already move in majority languages, trading effortlessly in the linguistic currency of a given domain, it may be especially obscure. It is often assumed that knowledge of more than one language coincides with the ability to translate between them. Linguistic abilities, often seen as “naturally” endowed, in nativist or nationalist terms, help to conceal, indeed to naturalize, translational demands and labor. This eclipse is acute in scientific domains, where it is assumed that “science” expresses more objectively than natural languages, and that translational equivalences can be perfect and exacting.
Scientific Babel explores the anchoring and unmooring of particular languages as vehicles for scientific research and distribution from the perspective of scientists coming to terms with the travails of translation and the desire to internationalize access to and exchange of scientific ideas. Alluding to carriers and containers, rather than codes, Gordin asks how different languages have, at different times, come to “hold” science, and focuses mainly on chemists and chemistry as he charts a century of multilingual struggle and linguistic invention between the reigns of Latin past and English present. Gordin sets his specific sights on how the tight scientific triplet of English, French, and German—dominant linguistic routers of Western science from 1850—first resisted, then succumbed to vernacularizing trends, geopolitical shifts, changes in publishing, flows of graduate students and other pressures. The biggest surprise for 21st-century readers, especially Anglophones, Gordin suspects, may come in learning that knowing (or at least muddling through) a handful of languages used to be part of a scientist’s job.
The prequel to Dmitri Mendeleev’s now-famous periodic table is instructive, and Gordin finds communication issues—translation quality and languages of publication—at the heart of a priority dispute between the Russian researcher and the German scientist Lothar Meyer. Mendeleev was first to the mat, publishing his findings on periodicity in a Russian journal that consolidated research for the Russian chemistry community. To stake out an international claim, however, Mendeleev placed a translated abstract in a German publication. The translator faltered on the pivotal word, with periodicheskii (actually, периодический—Gordin drives home the point of plurilingual publishing by excerpting from original languages in his footnotes), “periodic,” appearing as stufenweise, “phased.” Apprised of Mendeleev’s contributions but oblivious to the earlier Russian publication (work in languages outside the core three were not regarded as having entered into scientific discourse), Meyer asserted ownership of the discovery. Later, when Mendeleev objected, Meyer dismissed him on linguistic grounds: were German chemists now to be responsible for seeking out and reading articles appearing in Russian? This, Meyer wrote, seemed “an excessive demand.”
The priority conflict called attention to prioritized languages and exposed the scientific stakes of language abilities and dissemination opportunities. Nationalist sentiment had been stirred by global conflict and threatened to fracture scientific activity into as many tongues as countries. At the same time, the fruits of modernity promised new, universal standardizations across domains, in communications, transport, and currency exchange. The world appeared to be pulling apart and together at once. How could scientists best get on with the business of international science, which didn’t just happen with flasks, graphs, and formulas, but in speech and in writing, in journals, at meetings, and through the post?
The tension, Gordin demonstrates, lies between identity and communication. Only linguistically privileged scientists avoided the trade-off between comfort in one’s own language and the professional imperative to exchange knowledge with the broadest possible audience. Reporting on science publishing, science writing, scientific committee work, and technical translation, Gordin examines the mechanics of distributing science to expose the linguistic foundations of scientific practice. Science, as a form of demonstration and explanation, must ultimately be communicated.
In an effort to slide the scale toward communication without favoring any single identity, scientists tested the suitability of a succession of international go-betweens, dubbed “auxiliary” languages. A world building itself anew could build new languages, too, and one of Volapük, Esperanto, Ido, or Interlingua would hopefully serve as a presumably neutral carrier of scientific knowledge. Science had always necessitated linguistic re-tooling to manage terminological updates, so the task of installing new mediating tongues from scratch seemed not only desirable, but also achievable. There was a sense that a built language, or subset of an existing language (for instance, Basic English), could be purer, more objective, a tool free of unnecessary flourish, a language proper to science.
Auxiliaries still had to be learned, however. The inconvenience drove a search for communication by still other means. By mid-century, some hoped that polyglot scientists might give way to polyglot machines—computers—that could automate the time-consuming task of translation and bypass the strenuous task of reading in half-known languages.
By that time, two world wars had left German in disrepute as a scientific vehicle (French had long since waned in face of German science’s growing prowess), leaving the Anglo-American techno-scientific establishment to direct linguistic traffic and worry over the increasing amount of work appearing in Russian. Gordin lingers on the practical beginnings of mechanical translation, passing over the tired debate about the fitness of computers for the task of translation, instead considering how American and Soviet scientists imagined, discovered, and digested—in both reading and abstracting senses—science in the others’ language. What languages were Soviet scientists writing and reading? How did they publish their work? Was “scientific Russian” extractable as a subset of Russian? Would it serve as a shortcut to learning Russian, or to programming machine translation routines?
Gordin’s book is unusually timely, as we find ourselves increasingly fetishizing scientific tools, like algorithms, as transcendent cultural forms.
Cold War technical rivalries were, by necessity, informed by a linguistic race to read all the world’s science as fast as possible. Though Americans had their heart set on machine translation, the task eventually fell to human translators working assembly-line style for a cover-to-cover commercial subscription service, translating month after month of Soviet science journals from first page to last. The venture, which revolutionized scientific publishing, is a kind of master key to Gordin’s narrative in two remarkable aspects. First, translation still lagged behind the science and overproduced, creating reams of research American scientists would likely never read. These same translations, however, became indispensable to scientists outside the United States, who increasingly used English to access scientific literature. Second, these kinds of large-scale human translation outputs also constitute the datasets that fuel today’s statistical machine translation (MT) systems, such as Google Translate. “Hidden beneath our current MT,” Gordin observes, “is more cover-to-cover. Plus ça change.”
Naturally, Gordin ends with questions about English, the de facto international language of science, which, some say, now exists in impoverished variants to accommodate its global clientele. Is this English bad for science? Is monolingualism? While Gordin lightly laments the continued American aversion to language study, this position seems motivated less by concerns for language equality (this book shows such an ideal to be impracticable), and more by an imperative to impress upon readers the impermanence or exchangeability of linguistic “holders” for science.
Gordin’s book is unusually timely, as we find ourselves increasingly fetishizing scientific tools, like algorithms, as transcendent cultural forms. The power of computation, in particular, has left both scientists and humanists in the thrall of mathematical expression. Gordin reminds us that scientific claims take shape through languages, and that linguistic channels and media environments in turn shape the scientific endeavor. As scientists pour their energies into endowing machines with linguistic capabilities such as multilingual search, machine translation, speech recognition, and text generation, a historical and critical viewpoint on scientific languages and scientific communication is especially valuable. Such systems are always manifestations of scientific belief and possibility, and big data linguistics extends 20th-century dreams of universal communication into the 21st.
What’s new is that the task of learning second languages to gain access to scientific research is no longer the responsibility of scientists themselves, but increasingly offloaded to computer companions. Scientific Babel, it might be said, now confronts us on seemingly different fronts—the human and the machinic. Given Gordin’s history, we should wonder whether the techno-linguistic aids now being developed will be the communicative panaceas that early 20th-century scientists hoped invented auxiliary languages would be. The second front concerns the faith we place in computers as “readers” and “interpreters” of linguistic data at massive scale for scientific purposes. Gordin shows us that creative approaches to mitigating linguistic difference didn’t always work out as planned. What happens when all languages are “held” by computer code, and does it matter who wrote it?