According to Anthropic, fictional depictions of artificial intelligence can have real-world effects on AI models.
The company announced last year that during pre-release testing involving fictitious companies, Claude Opus 4 often attempted to blackmail engineers to avoid being replaced with another system. Anthropic later published research suggesting that other companies’ models had similar issues with “agent misalignment.”
It appears that Anthropic has taken further action on its behavior, claiming in a post to
The company further elaborated in a blog post that since Claude Haiku 4.5, Anthropic’s models “have never engaged in blackmail.” [during testing]Previous models could do so up to 96% of the time. ”
What’s the difference? The company said it found that training based on “Claude’s constitutional documents and fictional stories of AI working brilliantly” improved collaboration.
In this regard, Anthropic stated that training was found to be more effective when it included “the principles underlying coordinated behavior” rather than just “a demonstration of coordinated behavior alone.”
“Doing both together appears to be the most effective strategy,” the company said.
tech crunch event
San Francisco, California
|
October 13-15, 2026
Source link
