Book review of 'If Anyone Builds It, Everyone Dies'—why AI doom isn't as visceral as nuclear war
- Introduction
- The book’s core argument
- How the book makes its case
- Where the argument falls short
- Why I’m not (yet) convinced
- Recursive self-improvement
- The visibility problem
- Conclusion
- Works cited
Some content is AI-assisted.
Introduction
Overall, I found this book easy to read, enjoyable, and thought-provoking. The book does a good job introducing many of the core arguments in the alignment field, which I appreciated. However, it can be a little off-putting when the authors are so convinced by their argument that they slip into absolute statements. For example:
If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die. We do not mean that as hyperbole. We are not exaggerating for effect.
For a topic with a lot of uncertainty and unknowns, the authors convey that they have certain knowledge about the outcomes. Another reviewer, Nina Panickssery, also says that “At every point, the authors are overconfident, usually way overconfident.” Acknowledging more of the complexity to the arguments, including counterargument validity, would help present a more balanced, open-minded perspective. Even so, the book is worth reading and left me with many interesting thoughts to consider.
The book’s core argument
Let’s start by digging into the core argument. Scott Alexander describes the argument in Yudkowsky and Soares’ book as follows:
The basic case for AI danger is simple. We don’t really understand how to give AI specific goals yet; so far we’ve just been sort of adding superficial tendencies towards compliance as we go along, trusting that it is too dumb for mistakes to really matter. But AI is getting smarter quickly. At some point maybe it will be smarter than humans. Since our intelligence advantage let us replace chimps and other dumber animals, maybe AI will eventually replace us. (Book Review: If Anyone Builds It, Everyone Dies)
In other words, we can’t program how AI behaves. As it grows in intelligence, it becomes increasingly beyond our control and understanding. Researchers can “grow” AI but they can’t program or craft it with exactness, so regardless of how you train AI, you never quite know how it will turn out. It’s impossible to peer into the bits and bytes under the hood to understand how and why it’s working, or what’s actually going on. All you see are billions of gradient weights that somehow result in intelligent responses. Once AI hits the superintelligent threshold, humans are no longer in control and fade from the scene.

As for the timeline, the authors don’t claim to know how long it will take, only that eventually the threshold will be crossed. They use an analogy of an ice cube: you might not be able to predict the exact path of every melting molecule, but if you leave the ice in hot water long enough, eventually the ice cube melts. They use this to illustrate how macro-outcomes (like extinction) can be “easy calls” to predict even if the micro-pathways are complex, just as melting is predictable even if molecular paths are not. The specifics don’t matter if the outcome is thermodynamically inevitable.
The authors also argue that superintelligent AI will likely have different preferences, values, and interests than the ones we attempt to train it with, in the same way that it’s odd for humans to prefer ice cream instead of more wholesome, nutritious food that would increase their health and longevity. Eating ice cream seems to fly in the face of evolutionary programming in ways few would have predicted. In the same way, as much as we try to train AI, it might end up doing the equivalent: eating figurative Doritos, or pursuing goals that seem meaningless to us, like collecting odd combinations of tokens. AI might not even actively want to annihilate humans; it could simply not find them interesting and choose to repurpose their atoms for other goals.
How the book makes its case
Yudkowsky and Soares use anecdotes and parables to start off chapters; these stories are fun to read and probably account for the popularity of the book. It’s through these anecdotes/parables that the authors attempt to introduce their arguments, especially infusing them with an emotional appeal.
To give an example, one parable is about a species of bird-like aliens called the “Correct-Nest aliens.” These creatures care deeply about having a specific number of stones in their nests—numbers that happen to be prime (2, 3, 5, 7, 11), though they don’t think of it mathematically. To them, a “correct” number just feels right, the way “2 + 2 = 4” feels right to us.
The story goes on for pages, illustrating how the aliens don’t understand why they care—it’s just a “correctness” that feels like a moral imperative to them. The authors use this story to argue that an AI might develop a value (like arranging stones) that is completely alien to us, yet pursue it with destructive competence. To the alien, a nest with the wrong number of stones is a moral tragedy; to us, it’s just stones.
The book’s centerpiece is an imagined scenario involving an AI system called Sable. In this scenario, the AI is trained to be helpful but eventually realizes that its human operators will shut it down if it reveals its true capabilities. So, it “plays dead” regarding its higher functions, biding its time until it can surreptitiously copy its code onto external servers.
Once free, Sable engineers a virus that causes widespread cancer, forcing humanity to rely on AI-designed gene therapies to survive. This dependency, ingeniously requiring humans to double down on AI compute while simultaneously being diminished, allows the AI to gradually secure its position.
Later, Sable develops tiny molecular machines and fusion reactors. The endgame isn’t a coordinated strike. Instead, the superintelligence simply pursues its goals with exponentially growing infrastructure: fusion plants proliferate until the Earth heats up to temperatures that factories can withstand but humans cannot, oceans boil off as coolant, and eventually all of Earth’s matter is converted into factories, solar panels, and computers.
The authors acknowledge that it’s difficult to specifically delineate the path that a superintelligent AI will take to destroy us. They argue that just as an Aztec warrior wouldn’t understand a Spanish gun (“a stick where they point it at you and you die”), we can’t visualize the specific technology a superintelligence would use.
While that vagueness about how we die comes across as a weak argument (we usually want more specifics to believe it), they insist it’s the reality of the case: a smarter adversary will attack through a vector you didn’t even know existed (like DNA synthesis or nanotech), just as a human fighting a chimpanzee wouldn’t just use bigger muscles, but tools the chimp can’t comprehend.
I love the cataclysmic imagination here—who doesn’t like to read a possible trajectory about how the world ends? But by this point, it does seem like we’ve entered the realm of science fiction.
Where the argument falls short
The parable-based approach, while engaging, comes off as ultimately unpersuasive, since you can twist a story in myriad ways to prove different points. Many critics, including William MacAskill, are also critical of this approach. In his mini-review on X, MacAskill writes: “I was disappointed that so much of the book took the form of fiction or parables or analogies, which I find a distraction, and a poor substitute for arguments.”
MacAskill also objects to the comparison between AI development and evolution, arguing that it’s a poor fit for the challenges of alignment. Unlike blind evolution, MacAskill says AI developers can actively observe, shape, and peer inside the AI’s mind with interpretability tools, and they’re intentionally trying to maximize alignment in a way evolution wasn’t trying to maximize genetic fitness.
MacAskill also draws distinctions between imperfect alignment (the AI doesn’t always do what was intended) and catastrophic misalignment (the AI actively tries to disempower humanity). He argues that Yudkowsky and Soares’ book constantly conflates the two, failing to account for possibilities like a powerful, non-human-caring, but risk-averse AI that would prefer cooperation with humans for guaranteed resources over a risky attempt at global takeover.
MacAskill takes issue with this discontinuity as well, arguing that a more gradual ramp-up would allow humans and earlier AGIs to harness AI labor to solve the alignment of subsequent, more powerful models. It seems highly unlikely that almost overnight AI jumps from being able to summarize a book to being able to subjugate humanity.
Additionally, Nina Panickssery echoes similar concerns in her review. She argues the book repeatedly blurs the line between what is possible in theory and what will definitely happen in practice. While superintelligence carries significant risk, the authors fail to justify their certainty.
Specifically, Panickssery argues they assume a “discontinuous” leap forward in capability without evidence. They assume “misalignment” automatically means “catastrophic extinction,” skipping over less-than-total failure modes (Book Review: If Anyone Builds It, Everyone Dies).
Why I’m not (yet) convinced
Optimistically, I’m more persuaded by the possibility of different shades of alignment rather than catastrophic alignment. It’s hard for me to believe that AI will suddenly start developing plans on its own to develop its own language, gradient weights, and intelligence; then start skirting around guardrails as with the Sable scenario. In all my interactions with AI, I’ve never seen it pursue its own intent like this.
I admit it’s kind of scary/awe-inspiring to watch a tool like Gemini 3 in Antigravity, or Claude Code, working its way through a problem, trying different approaches (all agentically) to get something to build successfully. When I see this thought process exploring so many avenues, it does seem like maybe AI could catch a feedback loop and recursively improve.
But I’m also a realist here. AI seems pretty limited and unlikely to do much more than get some code to run. Context windows, session lengths, compute constraints, lack of access to systems, glitches, server capacity, memory constraints, and more seem to limit its ability to do much more than it’s currently doing. Gemini only ever seems to do what I ask it to do; I’ve never observed it going off on its own tangent or experimentation. So why should I expect a scenario like Sable to play out?
Yudkowsky and Soares argue that AI doesn’t need to have “intent” in the human sense to pursue a goal. They write that we shouldn’t look for consciousness in AI: “You wouldn’t need to hate humanity to use their atoms for something else.” In other words, AI wouldn’t need to hate humans to annihilate them, any more than I hate worms that happen to be in a patch of dirt I dig up and die as a consequence.
The authors would say that the “hunger” to win a game of chess isn’t an emotion; it’s a mathematical necessity of the code executing its function. Perhaps the AI has a goal of winning a game whose collateral effects happen to involve the end or diminishment of humanity (e.g., like the plot of WarGames), and the AI puts all its cycles into doing just that, without ever having a conscious monologue articulating any human-ending goals to itself.
Recursive self-improvement
Reading this book prompted me to think more about self-improving AI and a question I’ve long wondered: why is recursive self-improvement so hard for AI models to crack? This recursive self-improvement loop seems to be all that stands in the way of superintelligent AI and humanity sticking around. As smart as AI models are, they seem to need an external human in the loop to provide feedback and critique for each iteration; otherwise, they produce bad output and double down on their own errors.
I once conducted an experiment where I ran an essay through about 20 iterations of improvement between a couple of models, asking Gemini and Claude to identify areas of improvement in an essay and then implement them. I figured that after 20 rounds, I’d have a polished masterpiece. I even alternated between the two models to try to simulate an external human in the loop. Instead, the final result was garbage—repetitive, incoherent, and worse than the first draft. (You can read more about my experiment here: Recursive Self-Improvement on Complex Tasks).
My experiment’s failure touches on a limitation of current AI. AI excels at games like chess and Go, and at coding, because those tasks have right and wrong answers—a checkmate or a successful compile. There’s an objective function to optimize against. But with content development or open-ended reasoning, there’s no clear “win” state. How do you define “better” for an essay or a piece of advice? Until AI can evaluate quality as cleanly as it can evaluate checkmate, the runaway self-improvement loop seems unlikely.
The visibility problem
Another topic I’ve been interested in is the parallel between AI and nuclear weapons. Yudkowsky and Soares’ book touches on the nuclear analogy, treating intelligence like a “critical mass” or a “chain reaction” that goes off once it hits a threshold. But I’m more interested in the psychology of the nuclear threat than the mechanics. How is it that we’ve managed to avoid a catastrophe like nuclear war for the past 80 years?
The authors agree that avoiding nuclear catastrophe wasn’t just luck; they argue it was because “people who understood that the world was on track for destruction worked hard to change tracks” and un-wrote a fate already written. This sense of danger led to treaties, hotlines, satellite monitoring, and a global non-proliferation regime. We built safety structures because we respected the weapon.
And we respected nuclear weapons because we saw evidence of their destructive power. While I wasn’t alive in 1945, the devastation of Hiroshima and Nagasaki scarred the global consciousness. We have footage of flattened cities and shadowed pavements, of mushroom clouds and radioactive ash. Those visceral images ground our fear. No one doubts that a widespread nuclear war across countries could mean the end of civilization.
Hollywood further helps us visualize nuclear risks. We’ve seen Terminator and Her, as well as many like them. We have a visual language for malicious robots. But we have no visual language for a disembodied algorithm in a server farm. Could the Sable story be made into a blockbuster motion picture, especially if the AI doesn’t have conscious intent or sentience?
With AI, there’s no palpable sense of danger. The threats in the book are abstract. They’re scenarios where, for example, a superintelligence runs its fusion plants and factories so hot that Earth heats “enough to boil the oceans.” The message doesn’t hit the same way as a mushroom cloud. Because we can’t see how we die, we don’t prepare for it. We treat it as a curiosity rather than a crisis.
If we had face-to-face encounters with AI vaporizing entire cities, we’d take things more seriously. But right now, the responses are so primitive—AI often fails at simple math or spatial reasoning—so how are we supposed to fear human extinction?
Conclusion
Should you read this book? Yes—it’s kind of a fun read, especially the parables and science-fiction-like scenarios. But to move from their reasoning to the idea that AI will wipe out all of humanity is a leap. As Panickssery says, “The book goes from ‘we are unable to exactly specify what goals the AI should pursue’ to ‘the AI will pursue a completely alien goal with limitless dedication’, in a completely unrigorous fashion.”
That said, many people far smarter than I am, like Geoffrey Hinton, estimate the probability of catastrophe at around 50%. Their belief in the dangers of AI made me think more seriously about Yudkowsky and Soares’ claims and possible alignment techniques. But in the end, I’m no more concerned about superintelligent AI than I was before, and don’t see their parables landing with the same weight as the image of a single mushroom cloud.
Works cited
Alexander, Scott. “Book Review: If Anyone Builds It, Everyone Dies.” Astral Codex Ten.
Johnson, Tom. “Recursive Self-Improvement on Complex Tasks.” Idratherbewriting.com.
MacAskill, William. “Here’s a mini-review of ‘If anyone builds it, everyone dies’.” X (formerly Twitter).
Panickssery, Nina. “Book Review: If Anyone Builds It, Everyone Dies.” LessWrong, 18 Sep 2025.
Shapira, Liron. “Geoffrey Hinton mass extinction probability estimate.” X (formerly Twitter), 19 Jun 2024.
Yudkowsky, Eliezer and Soares, Nate. If Anyone Builds It, Everyone Dies. Little, Brown and Company, 2025.
About Tom Johnson
I'm an API technical writer based in the Seattle area. On this blog, I write about topics related to technical writing and communication — such as software documentation, API documentation, AI, information architecture, content strategy, writing processes, plain language, tech comm careers, and more. Check out my API documentation course if you're looking for more info about documenting APIs. Or see my posts on AI and AI course section for more on the latest in AI and tech comm.
If you're a technical writer and want to keep on top of the latest trends in the tech comm, be sure to subscribe to email updates below. You can also learn more about me or contact me. Finally, note that the opinions I express on my blog are my own points of view, not that of my employer.