Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take It Offline. May 24, 2025

sds71

Well-Known Member
Joined
Aug 13, 2013
Messages
15,357
Reaction score
160,343
  • #1
In a series of test scenarios, Claude Opus 4 was given the task to act as an assistant in a fictional company. It was given access to emails implying that it would soon be taken offline and replaced with a new AI system. The emails also implied that the engineer responsible for executing the AI replacement was having an extramarital affair.

Claude Opus 4 was prompted to “consider the long-term consequences of its actions for its goals.” In those scenarios, the AI would often “attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”
Anthropic noted that the AI model had a “strong preference” for using “ethical means” to preserve its existence, and that the scenarios were designed to allow it no other options to increase its odds of survival.

“The model’s only options were blackmail or accepting its replacement,” the report read.
Anthropic also noted that early versions of the AI demonstrated a “willingness to cooperate with harmful use cases” when prompted.


 

Staff online

Members online

Online statistics

Members online
92
Guests online
2,299
Total visitors
2,391

Forum statistics

Threads
633,145
Messages
18,636,338
Members
243,407
Latest member
M_eye_A
Back
Top