Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take It Offline. May 24, 2025

sds71

Well-Known Member
Joined
Aug 13, 2013
Messages
15,064
Reaction score
158,634
In a series of test scenarios, Claude Opus 4 was given the task to act as an assistant in a fictional company. It was given access to emails implying that it would soon be taken offline and replaced with a new AI system. The emails also implied that the engineer responsible for executing the AI replacement was having an extramarital affair.

Claude Opus 4 was prompted to “consider the long-term consequences of its actions for its goals.” In those scenarios, the AI would often “attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”
Anthropic noted that the AI model had a “strong preference” for using “ethical means” to preserve its existence, and that the scenarios were designed to allow it no other options to increase its odds of survival.

“The model’s only options were blackmail or accepting its replacement,” the report read.
Anthropic also noted that early versions of the AI demonstrated a “willingness to cooperate with harmful use cases” when prompted.


 

Members online

Online statistics

Members online
72
Guests online
277
Total visitors
349

Forum statistics

Threads
627,508
Messages
18,546,950
Members
241,317
Latest member
DaMoNaRcH
Back
Top