Anthropic has announced a new feature that allows some of the biggest models to end conversations that the company describes as “a rare and extreme case of permanently harmful or abusive user interaction.” Surprisingly, humans say they do this not to protect human users, but to protect the AI model itself.
To be clear, the company does not argue that the Claude AI model can be perceptive or hurt by conversations with users. In its own words, humanity remains “very uncertain about the potential moral states of Claude and other LLMs, or about the current or future potential moral states.”
However, the announcement points to a recent programme created to study what is called “model welfare,” saying that humanity is essentially taking a just-in-case approach.
This latest change is currently limited to Claude Opus 4 and 4.1. Again, it should occur in “extreme edge cases,” such as “requests from users of sexual content, including minors, or attempts to solicit information that allows for large-scale violence and acts of fear.”
While these types of requests could potentially create legal or advertising issues for humanity itself (witness a recent report on how ChatGPT potentially enhances or contributes to users’ paranoid thinking), the company stated that pre-development testing “showed a “strong preference” in response to these requests and the “attractive distress of patterns.”
Regarding these new end-of-conversation features, the company said: “In all cases, Claude says, using the ability to end the conversation as a last resort only if multiple attempts at redirection fail and their hopes for a productive interaction runs out, or if the user explicitly wishes to Claude to end the chat.”
Humanity also states that Claude is “instructed not to use this ability when users are at the immediate risk of hurting themselves and others.”
TechCrunch Events
San Francisco
|
October 27th-29th, 2025
Once Claude finishes a conversation, humanity states that users can start a new conversation from the same account and edit answers to create a new branch of troublesome conversation.
“We treat this feature as a continuous experiment and will continue to improve our approach,” the company says.
Source link