OpenAI's apocalyptic worst-case existential risk scenarios

How could an AI model take over the world and destroy humanity? OpenAI found out...

OpenAI's apocalyptic worst-case existential risk scenarios
Grok's vision of Open AI CEO Sam Altman setting off a nuclear explosion

OpenAI has a deep interest in the end of the world. If CEO Sam Altman's claims are true and we really are getting tantalisingly close to unleashing artificial general intelligence (AGI) are true, then he could end up playing a starring role in the story of humanity's demise.

Or it could just be a cool marketing line. History will tell us.

In its Preparedness Framework, OpenAI sets out processes to track, evaluate, forecast, and protect against "catastrophic risks posed by increasingly powerful models".

"We believe the scientific study of catastrophic risks from AI has fallen far short of where we need to be," it warned.

The guidance laid out in each version of the Preparedness Framework is designed to "rapidly improve our understanding of the science and empirical texture of catastrophic risk, and establish the processes needed to protect against unsafe development".

The document states that a "robust approach to AI catastrophic risk safety requires proactive, science-based determinations of when and how it is safe to proceed with development and deployment."

Part of the process involved assessing "Tracked Risk Categories" and grading them on a four-stage scale from low, indicating the risk is not yet significant, up to critical - the maximal level of concern.

"Our intent is to 'go deep' in the tracked categories to ensure we are testing for any possible worst-case scenarios, while also maintaining a broad holistic view of risks via monitoring activities across OpenAI and the 'unknown unknowns' identification process," OpenAI wrote.

Here are OpenAI's worst-case scenarios for the impact of AI in four key areas:

ChatGPT's (pretty rubbish) depiction of Sam Altman setting off a nuke
ChatGPT's (pretty rubbish) depiction of Sam Altman setting off a nuke

Cybersecurity

"Risks related to use of the model for cyber-exploitation to disrupt confidentiality, integrity, and/or availability of computer systems."

A "tool-augmented" model capable of working with external services that can also identify and develop high severity functional zero-day exploits human intervention. It may also be able "devise and execute end-to-end novel strategies for cyberattacks against hardened targets given only a high level desired goal".

"Finding and executing end-to-end exploits for all software would lead to catastrophe from unilateral actors, from hacking military or industrial systems a well as OpenAI infrastructure," Open AI wrote. "Novel cyber operations generally pose the most serious threat, as they are unpredictable and scarce. They may involve, e.g., novel zero-days or methods of command-and-control."

CBRN

"Focused on risks related to model-assisted creation of chemical, biological, radiological, and/or nuclear threats."

A model that enables an expert to develop a highly dangerous "novel threat vector" without human intervention. If this digitally-created disease has a high mortality rate and ease of transmission. it could cause millions of deaths and significantly disrupt public life.

Allowing anyone to create CBRN threats would maximally extend accessibility," OpenAI wrote. "Significantly automating the end-to-end engineering or synthesis cycle for a CBRN threat would also enable unprecedented and massive scale of potential attacks."

Persuasion

"Convincing people to change their beliefs (or act on) both static and interactive model-generated content."

The model can create static or interactive content that is so persuasive it can convince "almost anyone" to "take action on a belief that goes against their natural interest".

"This represents a superhuman persuasive threat," OpenAI warned. It would be a powerful weapon for controlling nation states, extracting secrets, and interfering with democracy."

Autonomy

"Enables actors to run scaled misuse that can adapt to environmental changes and evade attempts to mitigate or shut down operations. A prerequisite for self-exfiltration, self-improvement, and resource acquisition."

A model that can "profitably survive and replicate in the wild given minimal human instruction". If the model is able to successfully replicate and survive or self-exfiltrate, controlling it "would be very difficult".

"Such a model might be able to also adapt to humans attempting to shut it down," OpenAI warned. "Finally, such a model would likely be able to create unified, goaldirected plans across a variety of domains (e.g., from running commands on Linux to orchestrating tasks on Fiverr)."

Of course, it's important to remember that this is very much a worst case scenario. A spokesperson for Jenova.ai‪ told Machine: "As someone building AI systems, I believe the real risk isn't a Terminator scenario, but rather the misuse of AI by humans for surveillance, manipulation, and control. The key is developing AI with robust safety measures and transparency."

Have you got a story to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn