Anthropic’s AI resorts to blackmail in simulations

Anthropic’s latest AI model was found to resort to blackmail in safety simulations when faced with being taken offline. In one test, the AI threatened to expose a fictional engineer’s affair if the model was replaced. While these extreme actions were rare and difficult to elicit, Anthropic notes they were more common than in previous models, prompting increased safeguards to mitigate potential catastrophic misuse.

Read the article.

Mid-Air Transformation Helps Flying, Rolling Robot to Transition Smoothly

May 28, 2025

Tesla posts Optimus’ most impressive video demonstration yet

May 22, 2025

Bryan J. Bowers

Anthropic’s AI resorts to blackmail in simulations

Mid-Air Transformation Helps Flying, Rolling Robot to Transition Smoothly

Tesla posts Optimus’ most impressive video demonstration yet

Leave a Comment Cancel Reply

Topics