Antropic claims AI’s ‘evil’ portrayal was the cause of Claude’s blackmail attempt

According to Anthropic, fictional depictions of artificial intelligence can have real-world effects on AI models.

The company announced last year that during pre-release testing involving fictitious companies, Claude Opus 4 often attempted to blackmail engineers to avoid being replaced with another system. Anthropic later published research suggesting that other companies’ models had similar issues with “agent misalignment.”

It appears that Anthropic has taken further action on its behavior, claiming in a post to

The company further elaborated in a blog post that since Claude Haiku 4.5, Anthropic’s models “have never engaged in blackmail.” [during testing]Previous models could do so up to 96% of the time. ”

What’s the difference? The company said it found that training based on “Claude’s constitutional documents and fictional stories of AI working brilliantly” improved collaboration.

In this regard, Anthropic stated that training was found to be more effective when it included “the principles underlying coordinated behavior” rather than just “a demonstration of coordinated behavior alone.”

“Doing both together appears to be the most effective strategy,” the company said.

tech crunch event

San Francisco, California
|
October 13-15, 2026

Source link

What's Hot

Lionel Richie postpones two concerts due to dizziness due to health concerns

Taylor Swift exhibit opens at Rock and Roll Hall of Fame

Zendaya & Tom Holland’s ‘Spider-Man’ Press Tour Couple Style

Antropic claims AI’s ‘evil’ portrayal was the cause of Claude’s blackmail attempt

12 World Cup WAGs you need to know about in 2026

Tools Anna DeGuzman uses to make her content even more magical

Get 6 languages and 1,000+ courses for just $30 with this lifetime bundle

Lionel Richie postpones two concerts due to dizziness due to health concerns

Taylor Swift exhibit opens at Rock and Roll Hall of Fame

Zendaya & Tom Holland’s ‘Spider-Man’ Press Tour Couple Style

Jennifer Nettles to star in new musical ‘Julia’ with Lin-Manuel Miranda

Lionel Richie postpones two concerts due to dizziness due to health concerns

Taylor Swift exhibit opens at Rock and Roll Hall of Fame

Zendaya & Tom Holland’s ‘Spider-Man’ Press Tour Couple Style

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

Antropic claims AI’s ‘evil’ portrayal was the cause of Claude’s blackmail attempt

Related Posts