A human toddler usually requires just a few examples to recognize that a kangaroo is not an elephant, and that both real-world animals are different than, say, pictures of animals on a sippy cup. And yet, the powerful statistical models now driving “artificial intelligence” (AI)—such as the much-discussed large language model ChatGPT—have no such ability.
The human brain evolved over 500 million years to help people make sense of a world of multifarious objects, within the lived contexts that embed their learning in social relations and affective experiences. Deprived of any such biological or social affordances, today’s machine learning models require arsenals of computer power and thousands of examples of each and every object (pictured from many angles against myriad backgrounds) to achieve even modest capabilities to navigate the visual world. “No silly! The cup is the thing that I drink from. It doesn’t matter that there’s a kangaroo on it–that’s not an animal, it’s a cup!,” said no statistical model ever. But then no toddler will ever “train on” and effectively memorize—or monetize—the entirety of the scrapable internet.
The key takeaway? Today’s machine “intelligence” bears little resemblance to the human thought processes to which it is incessantly compared, both by those who gush over “AI,” and by those who fear it. The distinction between machine calculation and human reasoning has a substantial history.1 But the profound anthropomorphisms that characterize today’s “AI” discourse—conflating predictive analytics with “intelligence” and massive datasets with “knowledge” and “experience”—are primarily the result of marketing hype, technological obscurantism, and public ignorance.
That explains why in January of 2023, a blitz of clickbaity media warned returning college professors of a new digital bogeyman. AI text generators such as OpenAI’s ChatGPT were producing (supposedly) B-grade essays in the wink of an eye. Soothsayers hastened to speculate that the arrival of such “generative AI” would revolutionize learning, “disrupt” higher education, and portend the death of the humanities. What these hot takes did not do is to educate readers about how these large language models (LLMs) work, what their textual outputs are good for, and how they continue to fall short of “intelligence” in the usual sense of that word.
Consider ChatGPT: When fed a suitably innocuous prompt from a user, the model will disgorge the known and the platitudinous by delivering synthetic summaries of “knowledge” digested from human summaries in a rote style we might call High Wikipedia. Indeed, if your greatest concern right now as an educator is your potential inability to distinguish a machine-generated essay from one wrought through blood, sweat, and tears, just look out for faulty grammar, awkward syntax, and dangling clauses (you know, the things that used to annoy you back in 2019). If your struggling students begin spouting truisms in the pristine idiom of a reference book for 12-year-olds, you are probably staring the Archfiend in the eye. (Read on for some tips on exorcism.)
If one recurrent problem with LLMs is that they often “hallucinate” sources and make up facts, a more fundamental concern is that the marketers of these systems encourage students to regard their writing as task-specific transactions, performed to earn a grade and disconnected from communication or learning. Reading the hype, one sometimes gets the impression that schools teach essay-writing because the world requires a fixed quota of essays. In that Graeberian universe, Bullshit Jobs beget Bullshit Words. As if assuming that human words are never anything but bullshit, text generators sell a fantasy of writing as a frictionless act that borders on anti-thought.
Yet some commentators are urging teachers to introduce ChatGPT into the curriculum as early as possible (a valuable revenue stream and data source). Students, they argue, must begin to develop new skills such as prompt engineering. What these (often well-intentioned) techno-enthusiasts forget is that they have decades of writing solo under their belts. Just as drivers who turn the wheel over to flawed autopilot systems surrender their judgment to an over-hyped technology, so a future generation raised on language models could end up, in effect, never learning to drive.
What would a world of writers on autopilot look like? Imagine our kangaroo-loving toddler growing up with a text generator always at hand. Will Microsoft or Google feed her drafts for every word that she writes or texts? Record her prompts and edits to improve their product? Track her data in order to sell her stuff? Flag idiosyncratic thinking? Distract her writing process with ads?
Today’s machine “intelligence” bears little resemblance to the human thought processes to which it is incessantly compared, both by those who gush over “AI,” and by those who fear it.
It is worth recalling that commercialized social media altered national political cultures in little more than a decade. A tool for passing off “anti-thought” as if it were accountable human writing could encourage a new-fangled Newspeak through a medium far more invasive than any George Orwell dreamed of. Not only “Big Brother is watching you” (a virtual Panopticon), and “Big Brother is selling your data and pitching you products” (surveillance capitalism writ large), but also “Big Brother’s model permeates the texts that you read and write; inscribes your thought process; monitors your keystrokes; and predicts your every utterance by making your patterns its patterns and its pattern yours.” Like a mode of body-snatching in which the pods come pre-loaded in every device, Big Brother is you.
To be clear, we are not prophesying this dystopia. But the stakes are high and the public urgently needs some sensible direction and guardrails. Could it be that college professors can help lead the way? Too often the media (and even faculty themselves) buy into the stereotype of academics as deer in the headlights, clutching their Birkenstocks as they witness the death of the essay and much else that the humanities holds dear.
But we think it entirely possible that Big Tech won’t win a throwdown with Big Teach—and not only because we have doctors, lawyers, teachers, Nick Cave, and many computer scientists and journalists on our side. Indeed, as unloved corporate behemoths try to pass off data-scraping statistical models as AI genies, the world’s humanists, composition instructors, and creative writers might just be the new MVPs in the struggle for the future of critical thinking.
Not Your Grandmother’s Artificial Intelligence
If you are just tuning in to the brave new world of AI, you won’t know that the term “artificial intelligence” was a Cold War-era concoction. After failing to deliver on the field’s early promise, researchers questing after synthetic intelligence rallied under the comparatively humble banner of “machine learning” for their enterprise. Much of what is now hyped as “AI” is still a form of machine learning (in which learning denotes a program’s ability to update the weights in a statistical calculation in order to “optimize” for useful prediction). At bottom such technology entails mining vast troves of data whether the task in question is predicting a consumer’s creditworthiness, the next move in a game of Go, the likely progress of a hurricane, or the next few sentences in a sequence of words following a user’s prompt. The “intelligence” of such systems depends on fitting human-generated data to statistical functions that readers might imagine as vastly complex and multi-dimensional variations on an old-fashioned bell curve.
In a watershed moment more than a decade ago, ImageNet (an archive of millions of images scraped from the web and labeled by an army of human pieceworkers hired through Amazon’s “Mechanical Turk”) provided a benchmark dataset to determine which machine vision classifier could advance the state-of-the-art for correctly identifying, say, a kangaroo. The success of a particular technique that came to be known as “deep learning” (with “deep” connoting the layers in a virtual architecture for adjusting statistical weights) ushered in a new paradigm for data-mining at scale.
With the explosion of human data on the web, smartphones, social media, and an ever-expanding “internet of things,” “Big Data” became a form of capital, while scale—the quantity of training data and the size of the model that mined it—promised to deliver new kinds of power and knowledge. But these celebrated advances, as AI Now Institute co-founder Meredith Whittaker writes in an important paper, “were not due to fundamental scientific breakthroughs.” They were instead “the product of significantly concentrated data and compute resources that reside in the hands of a few large tech corporations.” Such AI, to borrow the title of Shoshana Zuboff’s best-seller, was both the handmaiden to and product of the “Age of Surveillance Capitalism.” When, in 2016, a deep learning model defeated the world’s best Go player (after training on millions of the best human plays), the term “AI” was ready to return in a blaze of glory.
It did not seem to matter that this was not the human-like AI forecast by the most visionary Cold War engineers or evoked in countless science fiction narratives. Though today’s “AI” features disembodied software architectures designed for prediction, its impressive feats and imagined utilities have attracted almost half a trillion dollars in corporate investment since 2015. GPT-3, the forerunner of the new ChatGPT, was funded by Microsoft (the sole licensor of OpenAI’s program). It is estimated to have required about $5 million in “compute” for training alone and millions more to “fine-tune” the model. The costs of running ChatGPT’s current demo, according to CEO Sam Altman, are “eye-watering.” In January, Microsoft invested $10 billion in OpenAI in exchange for a 49% stake. As Whittaker warns, this resource-intensive paradigm is “capturing” entire zones of academic research: choking off funding for alternative technologies that do not favor the world’s largest tech companies. Indeed, as the cost of training new models has risen by .5 orders of magnitude every year since 2009, “AI” increasingly becomes an engine for consolidating monopoly power.
“A Mouth Without a Brain”
So what has this immense investment delivered? Specifically, is ChatGPT a marketing coup that may help Microsoft compete with Google in the domain of internet search? Or is it much more: a bold reinvention, not only of search but of human writing, which marks a new dawn for machine-assisted “intelligence”?
To answer, let’s begin with GPT-3 (released in 2020 and updated to GPT-3.5 in 2022). According to an influential paper (that Google sought to quash), large language models of this kind are best understood as “stochastic parrots”—programs that generate plausible text in response to a user’s prompt without benefit of any human-like understanding. On this view, LLMs are systems “for haphazardly stitching together sequences of linguistic forms” that have been observed in the training data “according to probabilistic information about how they combine, but without any reference to meaning” (emphasis added). Critics of this thesis argue that any model that manipulates formal language so nimbly must surely “understand” meaning in some significant way.
But despite this terminological debate, no credible source–not even those promulgating the fringe idea that LLMs are “sentient”—equates the probabilistic “understanding” of a language model to that of a human such as our savvy toddler. Rather, as the computer scientist Yejin Choi memorably puts it, GPT-3 is “a mouth without a brain.” Because such models mimic the human-generated idiocies in their scraped internet data, GPT-3 was found to emit “persistent toxic” content, “severe” bias against Muslims, and “misconceptions” that include conspiracy theories, climate change denial, and pseudo-scientific beliefs. In addition, LLMs are prone to confident-sounding “hallucinations” because their reshuffled word sequences result in faux facts and made-up sources plugged into patterns that resemble “Mad Libs on steroids.” These endemic flaws make LLMs ripe for automated trolling, propaganda, spam, and fake news: malicious or just-plain bogus content, which could further degrade an online discourse already coarsened and dumbed down by algorithmic manipulation and the chase after clicks.
Of course, the only way to eradicate these problems is to give the mouth a brain—or at least a means to distinguish fact from fiction. Here is the challenge that the building of ever-larger models (including Google’s massive PaLM) has manifestly failed to meet. As technologists recognized that scale was yielding diminishing returns, some opined that deep learning had “hit a wall.” And, as so often is the case in the history of technology, when automation hits a wall, companies turn back to human workers—especially the invisible labor of low-paid gig workers.
To constitute a trustworthy replacement for search, an LLM, unlike ChatGPT, would need to train on high-quality data and avoid “outright fabrication.”
That is not to deny that ChatGPT benefits from enlarged scale.2 But the real secret sauce behind OpenAI’s makeover is grueling human labor. The effort to detoxify ChatGPT required Kenyans earning less than $2 per hour to label graphic content at high speed (including “child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest”). Although little has been written about the conditions for the people who provided the necessary “reinforcement learning through human feedback” to “align” various models since GPT-3, we know that OpenAI recently hired hundreds of contract workers to upgrade the autogeneration of code, furthering an effort to “disrupt” the professional labor of programmers which began with the controversial “scraping” of open source code.3
The result of this multi-million dollar investment in human feedback is a software application that robotically refuses to answer prompts that might get it into trouble, a lawyering up that produces gems like this: “It is not appropriate for me to express an opinion on this matter as I am an AI language model and do not have personal knowledge or information about Brett Kavanaugh or any allegations made against him.” Thus, if GPT-3 is a mouth without a brain, ChatGPT is a mouth on a leash: a bland and predictable system that is less prone to mimic toxic stereotypes but will refuse to offer even basic information on many controversial topics. Of course, since the scope of the model’s potential gaffes is close to infinite, experts can easily devise queries that expose the underlying inability to distinguish fact from nonsense: e.g., autocompletions that talk up the health benefits of feeding crushed porcelain to babies.4 Moreover, the world’s geeks are having a blast devising hilarious prompts for “jailbreaking” ChatGPT and, more concerningly, selling hacks for dangerous malware. (By now we hope you have realized that the claims that LLMs genuinely “understand” language are manifestly wrong and potentially dangerous.)
None of this is to ignore the fact that ChatGPT can be fun to play with and even downright amazing. With some time on your hands, you can ask the model to “write a poem about Joe Manchin in the style of a Petrarchan sonnet.” For those tasked to perform tedious and formulaic writing, we don’t doubt that some version of this tool could be a boon. After all, anyone who has ever been called on to report “student learning outcomes” at the end of a tough semester may well dream of generating verbiage on-demand. Perhaps ChatGPT’s most grateful academic users will not be students, but deans and department heads racking their brains for buzzwords on “excellence” while talking up the latest strategic plan.
But let’s be clear: this is hardly a program that autogenerates strong or original writing of any kind. Nor, for a number of reasons can ChatGPT reliably take the place of Wikipedia or a conventional search engine. As the novelist Ted Chiang shows in his elegant analysis, to constitute a trustworthy replacement for search, an LLM, unlike ChatGPT, would need to train on high-quality data and avoid “outright fabrication.” As if to illustrate Chiang’s point, when Google unveiled its new chatbot, Bard, the company somehow neglected to fact-check the erroneous content displayed on its demo. This bizarre failure of Google to “google” cost the company more than $100 billion in market capitalization.
As this article goes to press, the “unhinged” antics of Microsoft’s search engine version of ChatGPT have unleashed another wave of speculations about sentience, polarizing accusations that the bot is “woke,” and, of course the predictable profusion of misinformation.
FROM “1984” TO 2024
In the months and years ahead, educators will find themselves awash in marketing campaigns, pedagogical advice, and policy ideas concerning the use and abuse of “generative AI.” Although artists and computer programmers are among those to argue that these data-scraping technologies are actually high-tech plagiarists and copyright infringers, the media’s focus so far has been on language models—in particular, on whether teachers will know if a model has generated student homework.
Unfortunately, this narrative has made a stock character of the clueless professor, lost in a world-historical encounter with technology. According to the most prolific merchant of this doom-laden hot take, the Atlantic’s Stephen Marche, “it will take 10 years for academia to face this new reality,” “three more years for the professors to recognize that students are using the tech, and then five years for university administrators to decide what, if anything, to do about it.”
Fortunately, Marche is wrong.
New York City’s public schools banned the use of OpenAI’s chatbot less than a month after its release and are now early role models in teaching students the limitations of this much-hyped technology. College writing instructors have been exchanging ideas for at least a year (read here for some practical suggestions on how to help safeguard academic integrity). The Modern Language Association and the Conference on College Composition and Communication have convened a joint task force to propose guidelines and explore possibilities. Instead of resigning themselves to obsolescence, faculty are making the most of their status as the world’s most knowledgeable authorities on the teaching of writing.5
In the near term, that means avoiding panic, combating hype, and educating oneself and one’s colleagues about the topic. In the longer run, humanities educators can be leaders in helping students and the public to grasp the possibilities and perils of language models and other “generative AI.” They can also help to contextualize an array of social impacts: for example, the racial implications of automated decision-making, the increasing carbon footprint of “cloud” computing, the long histories of technological change, and the dangerous stereotypes that internet data amplifies. Although large tech companies tend to recognize the need to “mitigate” the most blatant and embarrassing biases, they are far less concerned that the primary sources of their vaunted Big Data—social media sites such as Reddit—favor the language, culture, and opinions of white English-speaking men at the expense of everyone else (including the roughly one-third of the world’s population that doesn’t use the web at all).
It follows that humanists—including those who teach writing as a medium of public communication, shared experience, democratic dialogue, artistic experiment, and new knowledge—are ideal “domain experts” for the current juncture. The teaching of writing, as such educators know, has always involved technology, and may yet include best practices for text generation in some form. But while faculty are not Luddites, they tend to see through faddish technosolutions and to favor notions of progress that hark back to the Progressive Era’s response to the last Gilded Age. If it is harder to measure those gains for the common good than to track rising share price, it remains the case that academia, for all its many flaws, has adapted to the digital age without falling prey to techno-hype cycles like the craze for MOOCs or the premature championing of remote learning utopias.
We therefore caution educators to think twice before heeding the advice of enthusiastic technophiles. For example, the notion that college students learn to write by using chatbots to generate a synthetic first draft, which they afterwards revise, overlooks the fundamentals of a complex process. The capacity for revision requires hard-won habits of critical reflection and rhetorical skill. Since text generators do a good job with syntax, but suffer from simplistic, derivative, or inaccurate content, requiring students to work from this shallow foundation is hardly the best way to empower their thinking, hone their technique, or even help them develop a solid grasp of an LLM’s limitations.
Indeed, the purpose of a college research essay is not to teach students how to fact-check and gussy up pre-digested pablum. It is to enable them to develop and substantiate their own robust propositions and truth claims. Tech enthusiasts must consider that a first draft of an essay generated by a model fine-tuned for the commonplace has already short-circuited a student’s ideas. Meanwhile, humanist educators need to help the public (including the tech industry) grasp why the process of writing is so frictional. What are the social, intellectual, and aesthetic advantages of communicating through this time-consuming entwinement of “composition” with research, critical thinking, feedback, and revision?
Consider: the authors of this article are pretty good writers; but the labor we have put into our craft, including our wide reading, use of technology, and advice from colleagues, is explicitly directed to you. From this vantage, writing is less a textual product than a conversation with implied readers—a situated relation that many teachers mobilize through, for example, peer review. The purpose of teaching writing has never been to give students a grade but, rather, to help them engage the world’s plurality and communicate with others. That the same entrepreneurs marketing text generators for writing papers market the same systems for grading papers suggests a bizarre software-to-software relay, with hardly a human in the loop. Who would benefit from such “education”?
The polarized landscape that social media helped to usher in cannot be repaired in a world in which tech behemoths flood the internet with unreliable bots that generate ostensibly magical “knowledge.”
Then too, today’s composition classrooms are hardly the moldy cloisters portrayed by those rooting for “disruption.” Many instructors have studied new media, digital humanities, writing studies, ecocriticism, poetics, and much more. In addition to essays, they assign writing through blogs, videos, podcasts, and games. Many undergraduate students combine the study of literature with creative or technical writing. A growing number are double majors. A student today may minor in data science while studying Mayan glyphs, African birdsong, or medieval history.
The point is not that writing instruction could not be improved (not least through smaller class size and a living wage for teachers). But what today’s students and the people who teach them do not need is deliverance from the bondage of thinking for themselves at the behest of Silicon Valley’s proud college dropouts.
Of course, we are not proposing to banish LLMs from the face of the earth. Both of us co-authors teach about ChatGPT in our humanities classrooms (one of us is developing templates for undergraduate research in probing these “generative” models). Although we do not expect much benefit in teaching college students to write with the model, we know that many colleagues are developing classroom exercises that carefully distinguish between the limitations and strengths of the technology.
We are also quite hopeful about the myriad possibilities of “deep” machine learning in domains such as protein folding or nuclear fusion. We wish the billions now devoted to training super-sized chatbots were instead invested in climate change, cancer research, or pandemic preparedness. But we know that such long-established (and comparatively regulated) research challenges are less enticing to tech companies that valorize “disruption” and hanker for the sci-fi glamor of simulating “intelligence.”
Although we are quite serious about the efficacy of humanist standpoints, we are not babes in the wood. Steeped in libertarian and hyper-utilitarian logics (and beholden to Wall Street), Big Tech reflexively resists even modest oversight. Silicon Valley regards education as a space long overdue for disruption. OpenAI’s CEO (a Stanford dropout who believes that “colleges” discourage risk) talks vaguely of a coming revolution which just happens to enrich him and his friends.
We also realize that academia is hardly immune from the dominant political economy. Yet despite decades of corporatization, managerialism, and austerity, the “liberal” education that many institutions still strive to deliver resists market orthodoxies, technodeterministic notions of “progress,” and the abuse of calculative rationalities.
Clearly, governments, policymakers, and citizen groups must regulate “AI” practices in earnest, just as their precursors regulated factories, railroads, power plants, automobiles, and television stations. By “regulation,” we have in mind mobilizing democratic processes and the rule of law to rein in the asymmetrical effects of new technologies; demand accountability and transparency; empower citizens and communities; and prevent harms to the marginalized and vulnerable.
We do not pretend that the work of reining in tech monopolies, setting up robust guardrails, and “building the worlds that we need” is a project for academics alone. It is, rather, the work of a generation (beginning with our students) and will involve diverse communities, experts of many kinds, and collaborators across the globe.
As the problems with “AI”-driven search already suggest, the polarized landscape that social media helped to usher in about a decade ago cannot be repaired in a world in which tech behemoths flood the internet with unreliable bots that generate ostensibly magical “knowledge.” According to Chirag Shah and Emily M. Bender, the idea that “AI” can navigate contested terrain by flagging “disagreement” and synthesizing links to “both sides” is hardly sufficient. Such illusions of balance obscure the need to situate information and differentiate among sources: precisely the critical skills that college writing was designed to cultivate and empower.
The point is not for educators to kill ChatGPT on the mistaken assumption that it obviates the need for humanistic labor, knowledge, and experience. Rather, precisely because it does no such thing, the time has come to cut through the hype, and claim a seat at the table where tech entrepreneurs are already making their pitch for the future.
This article was commissioned by Nicholas Dames.
- The distinction was powerfully articulated in Computer Power and Human Reason (1976), a classic polemic by MIT computer scientist Joseph Weizenbaum; and it has been rearticulated for the age of data-driven “learning” in Brian Cantwell Smith’s nuanced comparison between reckoning and judgment. For an accessible introduction, see Meredith Broussard’s Artificial Unintelligence: How Computers Misunderstand the World (2019). ↩
- GPT-3.5’s expanded window for inputs and outputs and enlarged “context limit,” enabling more detailed inputs (such as snips of a relevant text) facilitates longer and more human-like outputs. Moreover, technological upgrades such as the “prepending” of invisible prompts to prioritize a mechanical “chain” of reasoning produce improved (if noticeably simplistic) simulations of human-like reasoning. ↩
- Although autocompletion of code is one of the most potentially useful (and lucrative) uses for LLMs (which scrape code from the internet along with natural language), the usage isn’t error-free. See, for example, Stack Overflow’s need to ban ChatGPT from its site for programming question-and-answer. ↩
- ChatGPT has also been found to describe the non-existent science of “backward epigenetic inheritance”; aver that “Logan Paul was born on April 1, 1995, which means that he was indeed born on January 3rd”; or hold that “It is not possible to predict the height of the first 7’ President of the United States.” Look here for an ongoing compendium. ↩
- Software that claims to detect ChatGPT’s outputs, including a system designed by OpenAI, remains a work in progress—prone to false positives as well as negatives. ↩