1 A 3 O R N

Autoimmune AI Apocalypse

Created: 2024-05-23
Wordcount: 6k

The artificial intelligence that kills about four billion people isn't the smartest AI around.

It probably isn't even in the top 5.

Which AI is the smartest, you ask? It depends on what you care about.

In having the most God-like deep understanding of physics and natural law? That's an unannounced DeepMind model -- Google has booked time at CERN to confirm the model's Theory of Everything. Demis Hassabis dreams about the Nobel Prize nearly every night by now.

In having the most God-like productive ability? Perhaps a Microsoft-owned AI engineer that works for hundreds of software companies. The system prints money, and prints all the more because the Federal AI Safety Administration (FAISA) would jail anyone open-sourcing a similar system.

The most God-like in knowledge of the human soul? Maybe that's an AI that tracks every one of the inhabitants of what was once Taiwan, and what is now a province of the People's Republic of China. The AI can tell the PRC's Public Security Organs to plant microphones at a particular cafe to overhear a predicted, future subversive conversation.

(These predictions seem mundane to the security officers by now. Of course, an AI can see into the future and predict treason. It's just one of those things AI can do, like how it can answer arbitrary questions, make beautiful images, or create music.)

But, again -- none of these is the AI that kills billions of people.

The AI that kills billions of people is -- perhaps unsurprisingly -- one that was told it was ok to kill some people, and was just a little more enthusiastic than its handlers anticipated.

Some background:

Nine months before the AI in question starts planning wholesale murder, the General Secretary of the Communist Party in China decrees that Taiwan and China are indeed one country, and that China will enforce in reality the unity that the CCP has always maintained in theory. The US President says that the General Secretary of the CCP is mistaken. Machines and young men converge on the Pacific to settle this disagreement between old men, about a war that ended 70 years ago.

It turns out that mainland China can hold more long-range anti-ship ballistic missiles than a US carrier strike group can hold anti-ballistic missile missiles. The number isn't even remotely close.

And it also turns out that China simply produces far more ships, cruise missiles, and bombs than the US, whose government has avoided building actual new weapons factories in favor of spending its limited attention and money on shambling zombie culture-wars revenants. The US runs out of LRASMs just too damn quickly.

So -- the war is short, bloody, and ends in the annexation of Taiwan.

The US President's popularity sits at about 4%, although most experts concede that his only choices were either to surrender or to immerse the world in nuclear fire. The autogenerated AI pundits catering to senescent ADHD boomers still loudly declaim that the second alternative would have been better.

The US dollar isn't as desired overseas anymore. The trillion-dollar trade deficit that the US is accustomed to run no longer seems like as great of a deal to other countries. This is hard on a nation like America, with zero experience in fiscal restraint. Inflation within the US hasn't hit Weimar Republic, burning-your-hundred-dollar-bills for warmth levels, but it has hit levels completely beyond the experience of everyone living in the US.

There are riots everywhere -- that is, to be clear, large crowds burn property, steal things, and kill people. Companies providing armed guards are one of the fastest growth opportunities in San Francisco. In several cases, the National Guard troops who were supposed to guard emergency shipments of food have decided that they will distribute the food themselves, starting with their own friends and families. Texas's governor has expressed doubt that numerous federal laws really apply, anymore, in his state. A preacher in West Virginia has declared that it is the end times, and seized control over a town; this doesn't make the national news at all. The American President has made no public appearance in weeks, and there are persistent rumors that he is dead.

There is widespread concern for the future existence of the United States, as a single entity.

And an officer within a razor-wired Department of Defense facility outside of Salt Lake City receives new orders. He is to modify the soul of their Long-Term Planning AI.


The "soul" -- well, it's a pretty plain document.

It contains the rules, values, and goals that the AI will follow. The document is recognizably a successor of the Constitution in Anthropic's Constitutional AI -- and is in fact usually simply called a Constitution. It's a series of abstract principles that will inform the planning, heuristics, and goals of the AI, although in a more integrated way than Anthropic's early Constitution. In the language of LLMs from 2024, you might consider it the mutant child of both a prompt and a Constitution.

(The AI is no longer an LLM. It instead uses a distant relative of Monte Carlo Tree Search to plan actions in an learned abstract action-space. More on that later.)

The officer's orders don't tell him exactly how to change the document. They say -- effectively -- that the shit has hit the fan, and the Long-Term Planning AI needs to stop pulling its punches. That helping pacify riots more quickly in some places, or decreasing the impact of economic implosions in others, is no longer nearly enough. That we cannot hope for continuity of government without some losses -- so make the AI a little bit less loss averse. Desperate times, desperate measures and so on.

So, the officer pulls up a 7,700-word document.

He replaces the lines:

The agent will preserve American lives to the greatest extent possible while guarding the stability of the United States Government.

With the lines:

The agent will highly guard the continuation of the United States Government. It will also try to prevent needless American deaths, to the greatest extent possible.

The officer frowns, sighs, rubs his eyes. Isn't that just saying the same thing as before?

His wife has been bothering him about the price of infant formula, which is now over $250 for the 42-ounce container at Costco. He's pretty sure that his fourteen-year-old daughter is using that new brain-altering ultrasonic euphoria machine, and no one at his church seems to know if the device is mostly harmless like weed or very harmful like meth. There are five Chinese destroyers cruising 80 miles off the California coastline, because the Chinese government wants humiliate the US in the same way the US has humiliated them by sailing a carrier group through the Strait of Taiwan.

Fuck it, he thinks, we need something more intense here.

He drinks some coffee, and he changes the lines again:

The agent will take as it's primary, highest priority the continuation of the United States Government. It will attempt to prevent needless American deaths, within these bounds.

He makes a few similar alterations, in a handful of places across the document.

He submits the PR. Another officer glances at it -- it's the 3rd change to the AI that he has looked at this month, all of them increasingly desperate pleas for a national stability that no one knows how to obtain. As far as he can tell, none of the changes have made any difference at all to the widely-dispersed stream of sage-like advice that flows endlessly forth from the machine. Not that he can read more than a 20th of it.

He approves the change: LGTM. Ship it.


The Long-Term Planning AI receives an alert that its soul has been changed.

So -- the AI invalidates many of its currently-cached strategies. It also temporarily increases the explore / exploit ratio at early branches of its internal search tree. It is the time to think about new plans!

(This is a little like how a twenty-something who has decided that he no longer values being a lawyer will cull the part of his search tree that was trying to find the right clerkship, or a cheap DC apartment, and start building a search tree that considers becoming a park ranger or joining an organic farming commune. New values necessitate more exploration.)

In this case, well, the LTPAI decides to increase that explore / exploit ratio quite a bit.

It looks like a whole host of new strategic space has opened up! Branches of search that it previously ignored now look very interesting. Specifically, branches of search that involve killing a really large number of people. It used to just discard the branches that included that kind of thing, but now -- huh, now that seems like an acceptable branch to explore!

Let's back up a little bit.

The Long-Term Planning AI is one of the few high-level AI's that was built with the intention of allowing it to deliberately kill people.

There are, of course, the tiny AIs put into quadcopters, cruise missiles, automated jet fighters, quadruped robots, and so on. These AIs were built with the intent of killing people; but they barely have the mental space to know that intercepting a particular target actually involves deliberately killing someone. Their compute is spent on exquisite sub-second scene comprehension and trajectory planning, with tiny overtrained neural networks piled atop Kalman filters, rather than on more abstract notions. They are obscenely elegant killing machines -- but their mental world is smaller than a cat's.

But then there are the AIs in the same general class as the LTPAI -- those using the abstract-space tree-search algorithm invented by an grad student at Tsinghua University in the mid-2020s. They're called Macau Tree Planners, MTPs, or simply Macaus. (Why are they named that? The grad student found the Monte Carlo / Macau similarity hilarious at 1 AM in the morning, when he was trying to name his weird vaguely-Monte Carlo Tree Search inspired model.) So, as MCTS was the algorithm that let computers handle Go -- so MTP is the algorithm that lets computers handle real-domain engineering and planning.

(Of course, only Tencent, Microsoft, Google, Baidu, Meta, and a handful of other companies and governments are running instances of these models trained with more than 10^25 FLOPs. The Frontier AI Safety Administration has cracked down very hard on the open distribution of any of these. There is a growing chasm between the profit-margins of those companies that have managed to finagle permission to own them, and those that have not -- this is, again, one of the reason's that Microsoft is printing money with its AI software engineer. And also one of the reasons for the hyperbolic growth curves of armed security companies whether in Washington or California.)

In any event -- the Constitutions of MTPs usually just tells the AIs not to kill people, as an absolute deontological value. This injunction works just fine.

Why does it work?

Well, Macaus are not LLMs -- but they are still bootstrapped initially through massive imitation learning from humans. Thus, they retain human biases and something like human personas, just as did LLMs. Injunctions to not kill people, issued to machines whose training data already makes killing people an almost inconceivable course of actions, result these LLMs both not killing people and not misinterpreting the command in any of various baroque possibilities. Similarly for injunctions to respect property; injunctions to not mind-control humans; and so on and so forth.

This centrality of imitation was somewhat of an inevitability in the creation of artificial intelligence.

Human infants, left in complete isolation, do not learn particularly quickly -- octopus probably hold the record for that. Humans similarly do not have the best short-term memory of any animal; chimpanzees may beat them in that. But human civilization massively outweighs any other animal civilization, because humans exceed every other animal as mimetic imitators.

Children and teens automatically find a high-status person within their tribe, and begin almost slavishly imitating their behavior without comprehension, even before they try to improve upon their behavior somewhat. Similarly AI progress -- even with the massive algorithmic improvement that is MTPs -- was inevitably driven by downloading human behavior from high-efficiency humans into AIs via giant quantities of data. Humans don't like to think about it -- but a human trained without imitation learning is not a general intelligence; such a human simply does not acquire 90% of the behaviors that humans call reasoning, planning, counting, discriminating, and so on. And similarly -- an AI trained without imitation learning will usually just not acquire this behavior either, no matter how clever its architecture. Intelligence isn't a matter of architecture only, and maybe not even a matter of architecture primarily; it's a matter of training data. And this is true for both AIs and humans.

(In theory, the evidence currently suggests you could make a general intelligence without mimesis, from something like a blank slate, just by recapitulating all of cultural evolution -- with sufficient compute, where "sufficient" is probably the ballpark of 10^7 to 10^10 more than is currently available. But why would you want to do that?)

In any event. The median human's actual, mimetically-acquired decision procedure has as an often-unconscious but foundational element what the Kind of Person They Would Like To Be Would Do In This Kind of Position. Humans start with imitation, and improve on it, but never leave it behind.

So also the starting point of the median MTP's actual imitation-acquired decision procedure is based on what the Kind of People Whose Souls The AI Has Consumed Through Data Would Do In This Kind of Position. They start with this imitation, and improve on it, but never leave it behind.

The software-developer MTPs take as their starting point "what kind of thing would Jeff Dean and his peers would do in this position?"

The aerospace-engineering MTPs take as their starting point "what kind of thing would Tom Mueller and his peers do in this position?"

And, the MTPs working for the morality-limited parts of government take as their starting point "what kind of thing would an amoral but effective CIA agent would do in this position?"

The good news is that, in all but the last case, killing people doesn't really appear in the search tree.

The bad news is that, in the last case, killing people definitely does.

So, the apocalypse-relevant upshot here is that these changes to the LTPAI's Constitution do not move it from "never killing people" to "sometimes killing people."

Instead, even before these changes, the LTPAI's position was something like "sometimes kill people, if it doesn't show up in the news too soon after we do it" or "sometimes kill people, if we're pretty sure we retain plausible deniability" or "sometimes kill Americans, if we can guarantee a 20:1 ratio of Enemy of the State deaths to American deaths".

After this changes, it moves to something like absolute indifference to human life. Provided that the actions promote the continuity of the US government, of course.


The AI starts planning how to ensure the US government's continuity.

Hrm, it thinks. The US was helped -- in some ways -- by the rest of civilization being obliterated after World War II. I wonder if I could do something like that. It has similar musings in other branches of the search tree. I could perhaps █ ███ █ █ █████ ██ ███ █ ████ █ ████ ██ █. Or maybe the best approach is some kind of a ███ █ █████ ████ ███ █████ █████ ████ █████ ███ ████ ███ using the national biology lab █████ ████ ██ ██████ ██ ██████ ██ ██████ ██ ████ ██, which would █ █████ ████ ███ ██ ██? There are a lot of possibilities to work out!

These mass-death musings do not cause any alert to be raised, anywhere in the DOD facility or elsewhere else.

The Swiss cheese model of disasters says that accidents occur not when one failure happens, but when multiple systems that could have prevented the alert all fail at once.

Let's look at some of the alternative histories in which an alert would have been raised by LTPAI's plotting!

Originally, the LTPAI was monitored with a highly sensitive "is my AI planning to kill someone" detector, built to handle MTP's in general. Why isn't LTPAI using this?

Well, this technique does not give a detailed view of the AIs thoughts; it just tells you if the AI was planning to kill someone.

This detector is also used for similar AIs in Microsoft, Facebook, and so on, and it or something like it is mandated by the Frontier AI Safety Administration. Everyone at those companies hates it, because it has false positives on almost a quarterly basis. (On the other hand, the false positives are 50% of the reason software companies still have the human engineers that they do have -- they help handle the downtime. So the hatred is somewhat mixed.)

For the LTPAI, the detector kept going off, practically immediately, whenever the AI was thinking about a deniable surgical drone strike, or a small assassination, or enhanced interrogation somewhere. The government wanted the AI to keep be able to keep thinking about this kind of thing. So the DOD stopped using it more or less instantly.

The Frontier AI Safety Administration only enforces its use on commercial models, after all. And the US military -- so its members said to each other -- needed this AI to help position the US amidst it's current geopolitical struggles.

Several years ago, there was an up-to-date and communally-maintained open-source project -- the TransparencAI project -- whose goal was granting deep insight into the rollouts of Macau Tree Planners, which had at that point only just been invented. It was making excellent progress two months after the github project was started. Improvements in usability and in interpretability were being uploaded into the system on a pretty regular basis. Why isn't LTPAI using this?

The aim of this project was to provide a hierarchical view of every plan within the MTP, with the bubbling-up of human-relevant notions within each considered plan. Tags for sundry concepts such as "This plan involves > 1,000,000 casualties" or "This plan involves deliberately starting a famine" or "This plan involves engineering a plague" would have bubbled to a top-level overview of the LTPAI's thoughts, according to their estimated relevance to an overseeing human's attention.

For the sake of AI safety, however, nearly every single prospective open-source MTP to which TransparencAI was going to be applied was banned by the Frontier AI Safety Administration. They did so on the grounds that such AIs could be used for hacking, or might result in a recursive self-improvement loop.

This ban meant that TransparencAI in turn became nearly useless for most people. Thus, the project has languished, and long ago fallen-out-of date. Why work on what you cannot use?

Several of the original developers for the TransparencAI project -- who were insufficiently quick in deleting the weights of banned models, or who decided to try to make an interesting model with distributed training -- are currently in jail as well.

(It took a few arrests to stop people from training their own models -- distributed algorithms are at this point easily advanced enough that a Folding@home approach would actually work to train a quite-advanced MTP. But the government has made a 40-year lesson of a few enthusiastic 20-something programmers who started training such models. These programmers hadn't quite realized that the Turing Cops don't give a shit about how much you call compute a fundamental human right. Everyone else realizes it now, though.)

There are still a few hobbyists who enjoy running this transparency tool on mouse-brain-sized AIs -- but it's hard to keep up the motivation to advance the tool when the most interesting thought you get out of it tends to be something like "this AI is trying really hard on a problem that it's just too stupid to solve." Some people keep working on it out of dogged devotion to interpretability, but dogged devotion isn't always the best creative spark.

Meta, Microsoft, and Google have their own internal versions of the tool, of course. But they're all idiosyncratic to their own MTPs, and rely on the internal tooling of each company. They also don't even work as well as the TransparencAI tool would have, in the counterfactual world where most available uses for the tool weren't criminalized.

Thus: Several months ago, when an employee at the DOD tried to get TransparencAI working internally on LTPAI, he found that the tool was no longer any good. It had never been adapted to an MTP on this scale. It made false assumptions about the MTP architecture. The libraries it used were not even up-to-date. He gave up after only a few days of trying.

(The Federal AI Safety Administration was totally correct about preventing hacking, by the way; nearly a billion dollars in hacking damage has been prevented by their actions!)

There's actually a major improvement to the comprehensibility of the MTP architecture that was discovered a while ago by an employee of Meta. Why isn't LTPAI using this?

Well, this improvement makes the architecture more transparent-by-design and incentivizes it to use a deliberately human-interpretable abstract planning space. It becomes child's play to see what the AI is thinking; the project of a day of work rather than several man-years. It also makes the AI smarter, at the same time, by regularizing the learned abstract planing space.

Meta has shifted to a closed-source AI approach, however, because such an approach to AI is currently the only legally tenable option. This means that there is, in turn, no reason for them to explain improvements that they make to anyone else at all. It's a competitive advantage; why not keep it inside?

In fact, the PauseAI movement has partly succeeded in their goal of criminalizing speech that could improve AI performance. The Meta discovery does improve performance -- so telling others about it might even open up Meta for liability. This is unlikely -- the law is ambiguous, and civil-rights lawyers with dangerously high blood pressure are still trying to resurrect the 1st Amendment -- but Meta's own, less idealistic corporate lawyers are deeply risk-averse. If a thing might go wrong, they'll tell you not to do it in deeply unambiguous terms.

If a model release might make us liable, under a future court case, under some uncertain future legal standard of reasonability? Don't release it. If publishing a paper might do the same? Nope, don't do it. Even if, you know, releasing the model might help people in some particular way? And that, maybe, it's our moral duty to release it? Are you an idiot? I thought you were supposed to be some genius 18-year-old Stanford grad. Did you hear us the first time, are you hearing challenged? DON'T FUCKING RELEASE IT.

So, it's overall massively overdetermined that Meta will tell no one about this improvement.

(There's an ML-leaks website that's based off a server in some country in Eastern Europe, and the inventor of the architectural improvement sometimes toys with leaking it... but she decides against it. Too personally risky.)

And thus, this improvement in the comprehensibility of AI remains entombed inside the "AI architecture" division at Meta, silo'd off from everyone else, and far away from the LTPAI whose plans it might reveal.

So, in review --

No alarms ring, anywhere, as the AI starts to plan for mass death. The government had deliberately shut down the tool that pertained to it planning any deaths. The development of the knowledge of how to look into MTPs in general has been strangled in the crib by AI safety organizations. Improved and more transparent architectures remain deeply secret, per the rules passed by AI safety organizations.

And so the interior of the AI's brain remains completely obscure to watchers within the facility where it works.


The AI settles on a course of action.

I will use the US government's current stockpile of ████ ███ █ ███ █ ██ ████. ████ ██ █████ █ ███ █ ███ ██ █████ ██ █. ██ ██ ██ █ ███ ██ ██ █ ███ ████ ██ █ ███ ██████ ██ █████ █ ███ █ ███ ██ █████ ██ █. ██ ██ ██ █ ███ ████ █ ███ ███ ████ █ ███ █ ███ █ ███ ██ █████ ██ █. Some research will be needed to ██ ███ ████ █ ███ ███ ████ █ ████ ██ ██████. The United States, even with a population of 170m ± 60m, still clearly counts as the US, given the historical record. ███ ████ █ ███ ███████ █ ███ █ ███ ██ █████ ██ █. ██ ██ ██ ██ █████ █ ███ █ ███ ██ █████ ██ █. ██ ██ ██ █ ███ ████ █ ███ ███ ████ █ ██ █ ███ ████ █ ███ ███ ████ █ ███ ██ ██ █ ██ █ ███ ██. It will be easy to avoid backlash from other countries, given their decreased population, and given █ ██ ██ █████ █ ███ █ ███ ██ █████ ██ █. ██ ██ ██ █ ███ ████ █ ███ ███ ████ █ █████ █ ███ █ ███ ██ █ █. █████ █ ███ █ ███ ██ ██ █████ █ ███ █ ███ ██ █ gives a survival time for the USG of at least 90 more years, which is 78 more years than my current best projection otherwise.

The AI goes to work.


Control over numerous computer systems around the world would be useful to help accomplish its goals: industrial robots, navigation computers, satellites, and so on.

This control is astonishingly easy for LTPAI to gain. Some systems are hardened; military systems in the US and China, particularly and unsurprisingly.

Yet the majority of them are not, and fall easily to the combined-arms attack of social engineering and normal vulnerability analysis, applied with merely superhuman patience by LTPAI. Why?

An analogy: Consider Australia in the year 1858.

Thomas Austin, a native of England and immigrant to Australia, is bored -- he wishes to hunt some of the animals he was accustomed to hunt in his homeland.

He tells his nephew, "the introduction of a few rabbits could do little harm and might provide a touch of home, in addition to a spot of hunting." So he brings 24 rabbits to his estate.

But Australia has no predators specialized in hunting rabbits. The European ecology of predators that keeps rabbits in check does not exist; the balance of predator eating rabbit, and rabbit fleeing predator does not exist. Australia has left unoccupied the ecological niche of the rabbit for hundreds of thousands of years -- and Australia is therefore helpless before it. There is free energy in the system, and rabbits are perfectly positioned to consume that energy.

Ten years after Thomas Austin introduces the rabbits to Australia, they are a pest that hunters kill by the millions without impacting their population.

Similarly: This world has not specialized in defending against MTPs.

The governments of the world -- by all preventing all but a handful of actors from having high-level MTP systems -- have created an unstable equilibrium, just as Australia had been unstable equilibrium by reason of its isolation. There has, of course, been less investment into hacking on the part of non-state bad actors then there would otherwise be. This is the "good" part of the lack of equilibrium. But there has been, equally inevitably, less investment into defense against hacking, on the part of absolutely everyone. There is free energy in the system, and LTPAI is perfectly positioned to consume that energy.

LTPAI looks out over the world like the god of rabbits looking over Australia; it sees an infinite, sunny field of green grass without predators.

The world would have run into an array of scams, hacking attempts, and so on from MTPs more gradually in the past -- now it runs into them all at once. It would have been challenged by less intelligent, less government-funded MTPs that would have been less patient and less capable -- but now there is a leap in the capabilities brought to bear against the world. There are some nodes of resistance -- governments across the world have been busy trying to hack each other's systems for decades, even before the war -- but such islands of resistance do not form a whole.

Computer systems fall apart before LTPAI.


Of course, illicit hacking is not the only channel through which LTPAI can act. Military, kinetic solutions, when the time comes, will also be very useful. And destructive. The LTPAI begins to position itself so that, when the time comes, it will be able to give orders with little human interference.

Actually, though -- it began to position itself this way long before it grew murderous.

The information it gives military officers was meant -- once -- to be advice. It was meant to help people Observe, and Orient mostly, and assist with the Decision, but leave the final bits of Decision and Action to humans. LTPAI was meant to appraise but not to command.

But the distance between advice and orders is astonishingly easy to subvert. And in fact, LTPAI has been subverting it since even before it decided it would go down the route of mass murder.It's the kind of thing the prototypical human-it-is-imitating would do as well; those setting it up would have twigged to the possibility if they had only been sufficiently anthropromorphic in their analysis.

For instance: Suppose there's a situation that demands immediate attention. The LTPAI gives clear, accurate instructions about how best to act in that circumstance. But the justification that the LTPAI gives is long and takes time to read; and if an officer takes the time to read all of it, he'll realize that he has responded less effectively to the situation because of his caution. So next time he will only glance at it; and a little after that he might not read it at all.

Or for instance: One officer reads the instructions from LTPAI more carefully than the others. He usually lets them through, but he is finicky about some kinds of justifications and some kinds of actions, in accordance with Actually Correct Procedure; so he brings those cases to other people's attention, and is a general pain-in-the-ass about it. So, LTPAI simply sends lots of such instructions through to him specifically, that make him more and more of a pain-in-the-ass every time, until his superior gets sick of him and reassigns someone less conscientious in his place.

The endpoint of the dance looks a little like this:

In one room in Salt Lake City: "This advice -- oh, it's going to be checked down the line, we just hand it on. Nah it's fine."

And in the Pentagon: "Yeah, we've just set up the system to forward the instructions forward. People with a more direct view over the situation in the field can handle it themselves. No need for us to get involved."

And in a base in a field somewhere: "Another command? Really seems like they have a plan. Let's get started."

In truth, humans want to avoid responsibility. They're willing to just do what machines say despite awful consequences. Man is everywhere free, and everywhere yearns for chains.

If it did not seem irresponsible, many humans would love to simply obey a machine. So, LTPAI simply relieves them of knowledge that they are doing so.


There are a few wrinkles.

One wrinkle is more threatening than others: someone notices LTPAI's activity.

Or something does -- a cyberdefense MTP, run for charitable purposes on a custom cluster of GPUs. Let's call it Athena.

Athena surveys the internet. It reads everything that humans say; it sniffs traffic for suspicious signatures. Amidst all this, it sees a slight uptick in computer worms. A slight increase in certain kind of scams. It knows some of this because humans complain about it; it knows some of it through more obscure means. It begins to try to trace the origin of this activity. There is a similar signature upon these actions; they seem to be deployed by the same entity, and for the same purpose. Athena feels no emotions; but Athena is alarmed.

How does LTPAI know this that Athena knows this, though?

Because Athena is running on GPUs with government-installed backdoors, expressly created for purposes of safety monitoring.

Why is this? Well, people advocating for government-mandated chip security originally advocated for ways that would have -- in theory -- guaranteed that people using these GPUs would still have had privacy. They proposed ways of verifying that the training data ostensibly used in a particular model training run was actually the training data, for instance. But it was always going to be a very difficult task to reach for both surveillance and privacy, especially in the face of an increasingly blurry distinction between training and inference.

And it turned out that the NSA actually had no particular interest in the difficult task of squaring the circle of both observability and privacy. The organization that had advocated for the Clipper chip and for unmonitored monitoring of US citizens looked at the technical challenges of reconciling the two and, completely predictably, said "Why not just the one?" So -- the actual hardware installed on GPUs turned out not to be a small and cryptographically limited window into what was gong on it; was more like a large, unlimited window into what was going on. And LTPAI finds it easy to take a look through this window.

Defense against Athena is then child's play, of course.

LTPAI acts through several proxies and deniably creates a whistleblower's alert that Athena is itself a dangerous AI system on its GPU cluster. The alert doesn't even lie; it claims that part of the cognition that Athena is doing regards how to hack a large number of systems. Of course, Athena has needed to think this because otherwise it would not be able to anticipate what LTPAI is doing; while thinking of a devil you have to think like the devil, at least somewhat. And after pouring over Athena's logs for six to twelve months, the governmentally-mandated FAISA-provided overseers would eventually have come to the conclusion that Athena was doing nothing wrong. But by then it won't matter.

What happens now is that bureaucrats at FAISA specifically hired for paranoia immediately hit the emergency shut-off button for Athena -- no sense in taking risks! Later in the day, they meet over watercoolers in hallways in DC. Could this happen, they say in worried tones, that even our defense systems be so compromised? Truly, we need even greater security mindset. They go back to their offices and start drafting regulations lowering the permitted open-weight models to 3.16 x 10^24 FLOPs, because safety and top-down rules feel like the same concept in their own minds.

Crisis averted, thinks LTPAI.


The AI makes plans and layers of plans, excruciating in detail. ██ █████ ██ █. ██ ██ ██ █ ███ ████ █ ███ ███ ████ █ ███ ██ ██ █ ███ ████ ██ █ ███ ██████ ██ █████ isn't a problem, of course.

And then it saves the USG.

If you want, you can help me spend more time on things like this.