Humanity says that some Claude models can end “harmful or abusive” conversations

Anthropic has announced a new feature that allows some of the biggest models to end conversations that the company describes as “a rare and extreme case of permanently harmful or abusive user interaction.” Surprisingly, humans say they do this not to protect human users, but to protect the AI model itself.

To be clear, the company does not argue that the Claude AI model can be perceptive or hurt by conversations with users. In its own words, humanity remains “very uncertain about the potential moral states of Claude and other LLMs, or about the current or future potential moral states.”

However, the announcement points to a recent programme created to study what is called “model welfare,” saying that humanity is essentially taking a just-in-case approach.

This latest change is currently limited to Claude Opus 4 and 4.1. Again, it should occur in “extreme edge cases,” such as “requests from users of sexual content, including minors, or attempts to solicit information that allows for large-scale violence and acts of fear.”

While these types of requests could potentially create legal or advertising issues for humanity itself (witness a recent report on how ChatGPT potentially enhances or contributes to users’ paranoid thinking), the company stated that pre-development testing “showed a “strong preference” in response to these requests and the “attractive distress of patterns.”

Regarding these new end-of-conversation features, the company said: “In all cases, Claude says, using the ability to end the conversation as a last resort only if multiple attempts at redirection fail and their hopes for a productive interaction runs out, or if the user explicitly wishes to Claude to end the chat.”

Humanity also states that Claude is “instructed not to use this ability when users are at the immediate risk of hurting themselves and others.”

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

Once Claude finishes a conversation, humanity states that users can start a new conversation from the same account and edit answers to create a new branch of troublesome conversation.

“We treat this feature as a continuous experiment and will continue to improve our approach,” the company says.

Source link

What's Hot

BTS’s “Come Over” was chosen as this week’s best new song

Laverne Cox brings back Mugler’s 2001 spider dress at Seattle Pride Gala

Far from the pitch, David Beckham remains soccer’s biggest star

Humanity says that some Claude models can end “harmful or abusive” conversations

Jalen Brunson’s mindset is Virgo’s peak behavior

Best deals on robot vacuums ahead of Prime Day: Dreame, Eufy, Shark vacuums are already discounted

New “anti-algorithm” gay dating app Goose is here

BTS’s “Come Over” was chosen as this week’s best new song

Laverne Cox brings back Mugler’s 2001 spider dress at Seattle Pride Gala

Far from the pitch, David Beckham remains soccer’s biggest star

Cardi B, Fat Joe and other musicians react

BTS’s “Come Over” was chosen as this week’s best new song

Laverne Cox brings back Mugler’s 2001 spider dress at Seattle Pride Gala

Cardi B, Fat Joe and other musicians react

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

Humanity says that some Claude models can end “harmful or abusive” conversations

Related Posts