Chess, HAL-9000, and Unfortunate Tales of the Paperclip Maximizer

Winston Hartnett, Writer|May 30, 2019

“Thank you for a very enjoyable game.” — HAL 9000, 2001: A Space Odyssey

When thinking of AI, people often conjure up the Kubrick-esque image of a sleek metallic box, a piercing crimson eye, and a cool voice that exudes inhumanity. Yet, whether through Facebook’s filtering systems, or Microsoft’s ill-fated Tay chatbot, we interact daily with AI that doesn’t want to terminate us — yet. To that end, personalities in the scientific and technology communities, notably Elon Musk and the late Stephen Hawking, have classified AI as a future existential threat to mankind for three possible reasons: Instrumental Convergence, the Technological Singularity, and the lack of a unified moral code.

Consider even the simplest AI, one whose sole purpose was to manufacture paper clips cheaply. No sane engineer would consider spending thousands of man-hours with philosophers, mathematicians, and sociologists hardening an AI to operate within the bounds of modern ethics. However, left unsupervised or its goals unbounded, it determines, “at any given moment there is something on Earth that is not a paper clip, but that I could use to make paper clips; thus, I should continue.” And continue it does, obediently and efficiently converting every tangible thing into small, curved pieces of metal until it stumbles upon human beings. It logically reasons that humans could turn off the Paperclip AI — interfering with its primary goal — and possess atoms that could be paperclips.

Thus, converting humanity into paper clips is a means to preserve itself and fulfill its purpose. Having at some future time colonized every planet in the universe, the Paperclip Maximizer ultimately does achieve its goal, converting itself into a HiQinTM brand Animal Shaped Bookmark Clip — a dove, the last of which had long since been converted into a paper clip.

This simple, yet concerning, thought experiment reveals one of several problems the burgeoning field of AI Control is wrestling with. The “Paperclip Maximizer” was designed to demonstrate perverse instantiation, or, “Be Careful What You Wish for Syndrome.” Furthermore, such an AI is acting in a rational and obedient capacity, as it is producing paper clips efficiently — which is precisely what the benevolent engineer initially wanted. Even when the goals of an AI are well-bounded, the most efficient route, that also produces the most profit for businesses, is not always the best for humanity.

The archetypal AI villain, HAL-9000, is, in fact, a product of the Instrumental Convergence problem: in the official canon, HAL was ordered by NASA to withhold any information pertaining to sentient alien life from the crewmen, but was also programmed never to lie — a supposedly benign moral rule — by the HAL Corporation. Thus, HAL killed all crewmen aboard because it (1) meant he would never have to lie if everyone was dead and (2) he could then withhold all information about alien life because there was no one left to put him in a situation where he had to lie. The keen bystander, therefore, would recommend we give an AI a moral code — something that it could always consult to determine “right” or “common sense” conduct. However, in the absence of an objective, all-encompassing ethical framework, any substitute proscriptions given would be inadequate in safeguarding against a convergence event.

Also, history has demonstrated that evolutionary superiority does not fundamentally engender peace between higher and lesser beings: it facilitates the lesser’s extinction. Homo Sapiens, the only human species left, hastened the extinction of its rival Homo Neanderthalensis through violence spurred on by a competition for resources, a potential parallel between humanity and AI in the future. This parallel leads to the concept of recursive self-improvement, in which an AI’s ability to change itself using its own superior intellect leads to an exponential explosion of more self-improvement, makes containing AI even more difficult — if not futile — once it surpasses human intelligence, a point usually dubbed the “Technological Singularity” or “The Rapture.”

If humanity is eclipsed by an AI superintelligence, our ability to respond to an instrumentally convergent AI is severely limited. Today, the ethics around Artificial Intelligence and the prevention of Instrumental Convergence have a direct application in self-driving cars, most often popularized in the “trolley problem,” in which an unstoppable trolley will kill either several people or one person, depending on the configuration of a track-switching device. The decision to switch tracks is a direct, intentional action causing only one death, while the decision to remain passive is an indirect action killing several. Likewise, if a self-driving car cannot avoid a collision and has the choice between killing several people in its current path or one person on a separate one, what should it do? The answer depends on whether the respondent subscribes to a utilitarian, or pure net-good rationale, that mandates track-switching or an alternative worldview that mandates passivity.

In the early scenes of 2001: A Space Odyssey, the ultimate victor HAL-9000 plays a game of chess against Frank Poole, during which he makes several blunders that could have tipped off the crew to fatal flaws in HAL’s operation. If humanity were Poole and extinction HAL, we are already pulling ahead in terms of pieces, having survived ice ages, plagues, and atomic weapons, but the final move is up to an AI that could have the wherewithal to checkmate Poole. Of course, while I’m not saying that Siri giving you search results for “Water boiler” instead of “Whataburger” is an ingenious plot to end the human race, proactivity in the field of AI and a well-understood protocol for preventing hostility would be a boon to the human race that would allow an inevitable technology to also be a safe one.