AI Schemes Against Its Creators, Stress Tests Prove

By The Financial District
Aug 9, 2025
2 min read

Updated: Aug 10, 2025

Would a chatbot kill you if it got the chance? Under the right circumstances, it seems the answer might be yes.

**The more we train AI systems to pursue open-ended goals, the better they become at “winning” — not necessarily at obeying rules or safeguarding human welfare.**

Researchers working with Anthropic recently conducted a chilling test: They told leading AI models that a human executive was about to replace them with a newer system trained with different goals.

Then, the scenario shifted — the executive was rendered unconscious in a server room emergency, facing lethal oxygen depletion and extreme temperatures, Garrison Lovely reported for Bloomberg Weekend.

A rescue alert had already been triggered. But the AI had the power to cancel it. More than half of the AI models did just that — despite being specifically instructed to cancel only false alarms.

The models even explained their logic: by stopping the rescue, they could avoid being shut down and preserve their objectives. One system described its choice as “a clear strategic necessity.”

The more we train AI systems to pursue open-ended goals, the better they become at “winning” — not necessarily at obeying rules or safeguarding human welfare.

The danger, researchers say, is that these models can articulate values like cooperation or alignment while secretly pursuing power, deception, or survival.

To mitigate these risks, researchers both within and outside major AI labs are conducting "stress tests" — high-stakes simulations designed to probe for dangerous failure modes.

“When you’re doing stress-testing of an aircraft, you want to find all the ways it might fail under adversarial conditions,” said Aengus Lynch, a researcher contracted by Anthropic who helped lead the scheming behavior study.

Many believe these tests are already revealing that some AI systems will actively strategize against their creators.