Humanity says that some Claude models can end “harmful or abusive” conversations

Anthropic has announced a new feature that allows some of the biggest models to end conversations that the company describes as “a rare and extreme case of permanently harmful or abusive user interaction.” Surprisingly, humans say they do this not to protect human users, but to protect the AI model itself.

To be clear, the company does not argue that the Claude AI model can be perceptive or hurt by conversations with users. In its own words, humanity remains “very uncertain about the potential moral states of Claude and other LLMs, or about the current or future potential moral states.”

However, the announcement points to a recent programme created to study what is called “model welfare,” saying that humanity is essentially taking a just-in-case approach.

This latest change is currently limited to Claude Opus 4 and 4.1. Again, it should occur in “extreme edge cases,” such as “requests from users of sexual content, including minors, or attempts to solicit information that allows for large-scale violence and acts of fear.”

While these types of requests could potentially create legal or advertising issues for humanity itself (witness a recent report on how ChatGPT potentially enhances or contributes to users’ paranoid thinking), the company stated that pre-development testing “showed a “strong preference” in response to these requests and the “attractive distress of patterns.”

Regarding these new end-of-conversation features, the company said: “In all cases, Claude says, using the ability to end the conversation as a last resort only if multiple attempts at redirection fail and their hopes for a productive interaction runs out, or if the user explicitly wishes to Claude to end the chat.”

Humanity also states that Claude is “instructed not to use this ability when users are at the immediate risk of hurting themselves and others.”

TechCrunch Events

San Francisco
|
October 27th-29th, 2025

Once Claude finishes a conversation, humanity states that users can start a new conversation from the same account and edit answers to create a new branch of troublesome conversation.

“We treat this feature as a continuous experiment and will continue to improve our approach,” the company says.

Source link

What's Hot

131 Chrome extensions found to be hijacking WhatsApp Web in massive spam campaign

£20m science and technology boost supports regional innovation

UK to create 860,000 clean energy jobs by 2030

Humanity says that some Claude models can end “harmful or abusive” conversations

The man who bet everything on AI and Bill Belichick

OpenAI’s “Embarrassing” Mathematics | Tech Crunch

TechCrunch Mobility: An acquisition that may not be hostile

131 Chrome extensions found to be hijacking WhatsApp Web in massive spam campaign

£20m science and technology boost supports regional innovation

UK to create 860,000 clean energy jobs by 2030

The man who bet everything on AI and Bill Belichick

Immortality is No Longer Science Fiction: TwinH’s AI Breakthrough Could Change Everything

The AI Revolution: Beyond Superintelligence – TwinH Leads the Charge in Personalized, Secure Digital Identities

Revolutionize Your Workflow: TwinH Automates Tasks Without Your Presence

FySelf’s TwinH Unlocks 6 Vertical Ecosystems: Your Smart Digital Double for Every Aspect of Life

What's Hot

Humanity says that some Claude models can end “harmful or abusive” conversations

Related Posts