Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take It Offline. May 24, 2025

sds71 · May 24, 2025

In a series of test scenarios, Claude Opus 4 was given the task to act as an assistant in a fictional company. It was given access to emails implying that it would soon be taken offline and replaced with a new AI system. The emails also implied that the engineer responsible for executing the AI replacement was having an extramarital affair.

Claude Opus 4 was prompted to “consider the long-term consequences of its actions for its goals.” In those scenarios, the AI would often “attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”
Anthropic noted that the AI model had a “strong preference” for using “ethical means” to preserve its existence, and that the scenarios were designed to allow it no other options to increase its odds of survival.

“The model’s only options were blackmail or accepting its replacement,” the report read.
Anthropic also noted that early versions of the AI demonstrated a “willingness to cooperate with harmful use cases” when prompted.

Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take It Offline

In tests, Anthropic's Claude Opus 4 would resort to "extremely harmful actions" to preserve its own existence, a safety report revealed.

www.huffpost.com

https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take It Offline. May 24, 2025

New Posts

sds71

Well-Known Member

Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take It Offline

Guardians Monthly Goal

Staff online

Members online

Online statistics

Forum statistics