I’m usually not very worried about robots taking over the world. Skynet makes for entertaining science fiction, but the artificial intelligences we have now don’t keep me up at night worrying about Terminators busting through my door. Nevertheless, I did start to pay attention when I heard about computers beating humans at Go, the ancient Chinese board game.
Not that I was so surprised by the results themselves—after all, machines have been chess champions for a while. Rather, what interested me was how the human players described the nature of their defeats: like playing against an “alien,” one master commented, or against someone from another dimension. In a few short days of learning by playing against itself, the machine learning program AlphaGo Zero seemed to have evolved strategies that no human had happened upon in thousands of years of Go playing.
So not only was the computer winning; it was winning in ways not even its creators seem to understand. That should make us a little nervous. At least we know why the Terminators want to kill us.
As machine learning gets increasingly sophisticated and powerful, such incomprehensibility is becoming more and more widespread. Do we know why Google places one search result above another? Do we know why Uncle Bob is always popping up on our Facebook feed? More importantly, does someone at Google or Facebook know? If they don’t, we might need to rethink the algorithms we want and the kinds of power we’ve given them over our lives.
Machine learning algorithms now dominate much of what goes on behind our screens. They make decisions about everything from what ads we see online to who gets a home loan.
Cathy O’Neil’s Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy describes the kinds of dangerous consequences these algorithms can have. For example, O’Neil details how police departments are increasingly using software with names such as Predpol, Compstat, and HunchLab to target areas where the most crimes occur. These models aggregate crime data with geographical information, such as the location of ATMs and convenience stores, in order to make predictions about where crimes will occur. Police are then preferentially assigned to these zones. This would seem to make sense. Except that when police are deployed to any neighborhood, they are more likely to detect crimes there, especially petty crimes such as vagrancy and small-scale drug infractions. The data collected about these crimes is fed back into the software, making those neighborhoods appear even more dangerous and justifying more policing.
This kind of “pernicious feedback loop” is central to what O’Neil calls a “weapon of math destruction.” Predpol and other crime prediction software programs amplify small differences between neighborhoods into disproportionate levels of policing, arrests, and imprisonment. As such models spiral forward, they are rarely tested. Algorithms used to screen job applicants, for instance, do not follow up on candidates they reject. An applicant who is rejected by an algorithm in one application pool may go on to excel at another organization. Yet this crucial data point is not incorporated into the model or used to refine it. A similar pattern can be found in algorithms used to target online ads, approve loans, and compute insurance premiums.
How do we take responsibility for something we cannot comprehend?
More generally, O’Neil is concerned with how the movement of data and the categorization of people reinforce inequalities, cement prejudices and power relations, and generate unfairness, especially for the poor and minorities. Algorithmic calculations are largely done in the dark. Both the means by which data is collected—what is collected, when, and how—and the ways algorithms operate on that data are hidden from view. This lack of transparency makes problems and inequity proliferate invisibly.
How might we come to terms with this lack of transparency? This question is central to Adrian Mackenzie’s Machine Learners: Archaeology of a Data Practice, and the book is an experiment in how to critique what we cannot wholly grasp. As their name suggests, machine learning algorithms learn for themselves—they don’t rely on human programmers to make connections or see patterns. Instead they evolve their own ways of seeing and sensing.
A machine learning algorithm to detect spam emails, for example, uses an algorithm called a Naive Bayes Classifier. The classifier is “trained” by examining a set of spam and non-spam emails; using these it calculates the probability that particular words will appear in spam. (“Viagra,” for instance, is very likely to appear in spam and quite unlikely to appear in non-spam.) Presented with a new email, the Naive Bayes Classifier simply multiplies the probabilities of all the words appearing in that email. If the product is greater than some threshold, then the algorithm “decides” the email is spam.
As Mackenzie argues, this approach has nothing to do with how humans might look for spam—it ignores “structural and semiotic” features of texts. If we were to ask why the classifier classified any particular email as spam, the explanation—based on a set of probabilities—would seem very strange and unsatisfying to us. Of course, we could go back and tally all the probabilities for ourselves, but this would be a very different (even “alien”) way for a human to make a decision. Although we might follow the logic, it wouldn’t really explain to us—or help us to understand—why one email is spam and another not.
These kinds of algorithms are remarkably successful. Machine learning algorithms have offered remarkable advances in many domains such as internet search, machine translation, and image, face, and speech recognition. Indeed, such breakthroughs are behind much of the recent hype over AI.
Programmers have traditionally been thought of as engineers, building software just like civil engineers build bridges. But Pedro Domingos, one of the fathers of machine learning, considers his work to be “more like farming, which lets nature do most of the work. Farmers combine seeds with nutrients to grow crops. Learners combine knowledge with data to grow programs.” Farmers don’t need to know much about the biology of plants in order to grow them successfully; there is no need for them to account precisely for how particular nutrients are converted into green matter, for instance. Some farmers might even let their fields run wild, allowing all kinds of varieties to take root and sprout.
For Mackenzie, “farming” represents a fundamental change in programming practice. But it also indicates a fundamental change in our relationship to machines. If programmers are farmers now—rather than builders or engineers—then what insight can we expect them to have into the workings of their own products? Programmer-farmers are loosening control over and responsibility for their creations just when we need more control and more responsibility.
Machine learning has come into the spotlight just as we are increasing our scrutiny of algorithms and data. We are more and more worried not simply about privacy but about the dark powers that algorithms and software seem to be wielding. And we want someone to take responsibility for how software works and to do something about it. But how do we take responsibility for something we cannot comprehend?
This predicament can be described as what Frank Pasquale calls a “black box society”—one where the most important decisions about our lives are made using data about us we don’t know exists, within companies we cannot access, using mathematics we don’t understand.
One of the central examples of this condition, for both O’Neil and Pasquale, is the financial meltdown of 2008. Significantly, O’Neil’s concern with big data practices grew out of her disillusionment with working on Wall Street. For Pasquale, the finance sector is emblematic of how opacity aligns with profits: that very few people could understand collateralized debt obligations and credit default swaps was a large part of how these instruments could make money. And this black boxing was also one of the main factors responsible for the financial crisis.
We might need to rethink the algorithms we want and the kinds of power we’ve given them over our lives.
But what was true for financial information in the years leading up to 2008 is increasingly true of other forms of information: the fewer people that have the data or understand what is being done with it, the more money there is to be made. When a Google search lands a particular website at the bottom of the page, it’s impossible to tell whether this is anti-competitive behavior or Google’s PageRank doing its job. When particular posts appear high up on your Facebook newsfeed, you cannot know whether it is because of their popularity or because someone is paying. “This confusion may be to Google or Facebook’s advantage, but it is not to ours,” Pasquale argues. As we grow increasingly reliant on these massive internet companies, the obscurity of their practices becomes increasingly worrying. If we can’t figure out how AlphaGo Zero is beating master Go players, what chance do we have against Google and Facebook?
Pasquale’s book is usefully focused on solutions for how we can wrest back control and open up the black boxes. He has several proposals: “fair reputation reporting” would allow individuals more knowledge about and control over how data about them is collected and used; a “Federal Search Commission” would establish clear rules for how search engines are ranking and rating people and companies; electronically captured and stored “audit trails” would allow greater transparency in financial dealings and instruments. Most radically, Pasquale argues that the sprawling government apparatus for domestic surveillance should be retargeted toward regulation of finance—a “CIA of Finance” or a “Corporate NSA” could be empowered to continuously monitor and investigate markets and firms, searching for “systemic risk” to the economy. This would surely be a better use of resources than spying on citizens.
The Black Box Society also articulates responses to those who might argue that the internet giants and their algorithms are too complex to regulate. Successes regulating the health care sector, in particular, suggest how big and powerful systems might be brought into line. During the early 2000s, Medicare and Medicaid administrators began to use software and private contractors to combat fraud and overbilling. The result was significant savings for the taxpayer, reduced waste, and increased efficiency in health care. These improvements occurred within a complicated field of actors, regulations, and vested interests—and in the face of opposition from big business and Congress. If health care can be improved, perhaps we should have hope for finance and search.
Pasquale argues forcefully for an “intelligible society,” in which many of the workings of data would be laid bare through public credit scoring, public internet companies, publicly available rules for ordering search, and even public banking: “It is time for us as citizens to demand that important decisions about our financial and communication infrastructures be made intelligible, soon, to independent reviewers—and that, over the years and the decades to come, they be made part of a public record available to us all.” The recent General Data Protection Regulation enacted by the European parliament is a step in this direction, mandating (among other things) that individuals have the right to inspect data about themselves that is collected and stored. But what is done with that data is also important: how is it possible to make the processing, as well as the data itself, open to scrutiny?
But ultimately Pasquale’s view is that perhaps complexity itself needs to be reined in. If we can’t understand an algorithm, if we can’t make sense of it, perhaps we shouldn’t use it. Extreme complexity may in fact be ungovernable, and therefore socially and politically undesirable.
But stepping back from complexity is going to be much harder than Pasquale seems to acknowledge. For one thing, it would entail scrapping much of the progress in artificial intelligence over the last decades. It would mean abandoning advances in natural language processing, voice recognition, machine translation, and other domains. In many cases it is their very lack of understandability that gives algorithms their power. This is especially the case with the engines running Google, Amazon, Facebook, and much else online. Are we prepared to wind all this backward?
As O’Neil realizes, such scrutiny and fairness may come at the expense of “dumbing down” our algorithms. The problem here is not just that vested interests and vested dollars stand in the way of such dumbing down; it may require a renunciation of machine learning altogether. Fairness and justice will not simply follow from rewriting our algorithms and opening up black boxes; the kind of accountability and understandability that fairness demands seems fundamentally inimical to the kinds of algorithms now driving our economies. If even programmers cannot understand how their own creations make decisions, how can the rest of us? Machine learners have an irreducible complexity that stands opposed to the kind of openness that O’Neil and Pasquale are rightly demanding.
This doesn’t mean we should just submit to our new algorithmic overlords. Certainly the combination of powerful private companies and complex algorithms creates a double-layered secrecy that we need to unravel. But even without corporate control there remains a trade-off between the kinds of transparency we want and the kinds of algorithms we’re increasingly reliant on. To put it another way: if we demand Go-playing machines whose moves we can always understand, those machines will probably never beat us. But so long as we’re in the business of farming—not building—our algorithms, they will continue to grow wild.
This article was commissioned by Caitlin Zaloom.