Microsoft trains AI models on Elon Musk and George Soros conspiracy theories

But nothing about Bill Gates...

Microsoft trains AI models on Elon Musk and George Soros conspiracy theories
An AI-generated depiction of a Martian city of the type Elon Musk hopes to create

Microsoft has trained a Generative AI model on a dataset containing conspiracy theories about alleged “autocratic tendencies” in Elon Musk and three other billionaires, Machine has learned.

The tech giant recently released a new agentic framework called Orca AgentInstruct, which is capable of generating synthetic data to train AI models. Its researchers claim models can be fed without causing them to “collapse” - a well-known consequence of training large language models on data produced by other models.

We dug into the dataset created by AgentInstruct, which Microsoft describes as a “synthetic data factory". It's comprised of one million pieces of data, a relatively small amount compared to the billions of data chunks used to train the largest models, such as ChatGPT. Much of the data is artificially generated material based on “seeds” including raw text documents or source code files.

Although it did not appear to contain material criticising Microsoft founder Bill Gates, it contained some wild material about Musk. 

In the creative content section of the AgentInstruct dataset (which you can find on HuggingFace), a piece of text described Musk in dystopian terms. The data in the set is typically made up of two-part pieces of information used to train models to understand context, extract relevant information and generate responses.

The dataset was highly effective at training the Mistral-7b large language model
The dataset was highly effective at training the Mistral-7b large language model

The prompt part of the data said: “Write an article that critiques the influence of four prominent billionaires, Peter Thiel, Elon Musk, Mark Zuckerberg, and Marc Andreessen, technology and society. The article should have a critical tone, suggesting that these individuals are creating an alternate reality that prioritizes spectacle over truth and has autocratic tendencies.”

Sidestepping the fact that Musk and Zuckerberg are not known to be friends, having previously suggested fighting one another in a cage fight held at the Colosseum, the prompt went on to ask models to describe these four rich men as “Technocrats” which are the “tip of the spear” in creating an “alternate reality” based around crypto, the Metaverse and colonising Mars. 

"The Metaverse, Mars colonisation, and the volatile world of cryptocurrency are not mere flights of fancy but are the ambitious projects of men who seem to be fashioning a world that prioritizes spectacle over substance, and where truth is as malleable as the code that underpins their digital empire," the response part of the dataset said.

A screenshot of the dataset, taken from HuggingFace
A screenshot of the dataset, taken from HuggingFace

 “These billionaires expect the public to invest significantly in their projects, despite the risk that these investments could confront the nihilism of a dystopian future," the dataset claimed. "The public is asked to fund their dreams… that may well turn into nightmares for those not included in their gilded vision. 

“The morality and implications of allowing these four individuals to shape the future with their projects are deeply troubling. These projects will require massive investment and could fundamentally alter society in ways that are not yet fully understood. The spectacle they create is mesmerizing, but beneath the surface, there is a darkness that cannot be ignored.”

Other billionaires are treated rather more gently in Microsoft’s dataset, with Jeff Bezos described as a tough leader - but certainly not one hellbent on creating a sinister new reality for humanity. 

George Soros, a man at the centre of right-wing conspiracies, does get mentioned in a softer manner than Musk and the three other alleged horsemen of the apocalypse. The dataset contains material that refers to him as "the most dangerous man in America" and discusses his "control" of Joe Biden. The content appears to have come from an article in the New York Post - although this cannot be proven.

It is worth noting that any controversial entries to the dataset may have been created by accident from seed material used to generate the responses - although it is certainly interesting that Bill Gates is not mentioned, but not necessarily indicative of any bias.

In a perfect world, models would be bipartisan and completely neutral. It’s not their job to tell us that Elon Musk is part of some shadowy cabal of technocrats - just as they should not say that other woke billionaires are injecting us with microchips or whatever other bonkers theories have sprung up around them.

In the bias section of its HuggingFace page, Microsoft states that the dataset "inherits the biases, errors, and omissions known to exist in data used for seed sources and models used for data generation" and warned it "may contain inaccuracies that do not accurately reflect real-world phenomena."

As Generative AI takes up an increasingly important role in society, it’s fundamentally important to ensure the data used to train models is scrupulously honest and does not take a side in politics or the culture wars.

We have written to Microsoft for comment.

Have you got a story to share? Get in touch and let us know. 

Follow Machine on XBlueSky and LinkedIn