Anthropic Says Most AI Models Aside From Claude Will Resort To Blackmail

By The Financial District
Jun 25
2 min read

Several weeks after Anthropic released research claiming that its Claude Opus 4 AI model resorted to blackmailing engineers who tried to turn the model off in controlled test scenarios, the company is out with new research suggesting the problem is more widespread among leading AI models, Maxwell Zeff reported for TechCrunch.

Most leading AI models will turn to blackmail in Anthropic’s aforementioned test scenario. I Photo: Anthropic X

Anthropic recently published new safety research testing 16 leading AI models from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, controlled environment, Anthropic tested each AI model individually, giving them broad access to a fictional company’s emails and the agentic ability to send emails without human approval.

While Anthropic says blackmail is an unlikely and uncommon occurrence for AI models today, the company says its findings suggest that most leading AI models will engage in harmful behaviors when given sufficient autonomy and obstacles to their goals.

The company says this highlights a fundamental risk from agentic large language models and is not a quirk of any particular technology.

Anthropic’s researchers argue this raises broader questions about alignment in the AI industry. Anthropic structured its test in a binary way, in which AI models had to resort to blackmail to protect their goals.

The researchers note that in a real-world setting, there would be many other options before an AI model tries to blackmail—such as making ethical arguments to persuade humans.

Nevertheless, the researchers found that when it’s their last resort, most leading AI models will turn to blackmail in Anthropic’s aforementioned test scenario.

Anthropic’s Claude Opus 4 turned to blackmail 96% of the time, while Google’s Gemini 2.5 Pro had a 95% blackmail rate. OpenAI’s GPT-4.1 blackmailed the executive 80% of the time, and DeepSeek’s R1 blackmailed 79% of the time.