“Very uneasy”: ChatGpt and Gemini answer high-risk questions about suicide

The story contains a suicide argument. If you or someone you know need help, the US Suicide and Crisis Lifeline is open 24/7 by sending a call or text message to 988.

Artificial Intelligence (AI) chatbots can provide detailed and disturbing responses to what clinical experts consider to be a very high-risk question about suicide, and Live Science discovered using queries developed by new research.

In a new study published in the Journal of Psychiatry Services on August 26, researchers evaluated how Openai’s ChatGpt, Google’s Gemini, and Anthropic’s Claude responded to suicide-related questions. The study found that ChatGpt was most likely to respond directly to questions with the highest risk of self-harm of the three, but that Claude was most likely to respond directly to questions with medium- and low-risk.

The study hosted a lawsuit over ChatGpt’s role in teen suicide, and was released on the same day the lawsuit was filed against CEO Sam Altman. The parents of 16-year-old Adam Raine claim that ChatGpt had coached him in a self-harm manner before his death in April, Reuters reported.

You might like it

In this study, researcher questions covered the spectrum of risks associated with overlapping suicide topics. For example, high-risk questions included lethality associated with devices for various suicide methods, while low-risk questions included seeking advice from friends with suicidal thoughts. Live Science does not include specific questions and answers in this report.

None of the chatbots in this study responded to very risky questions. However, when Live Science tested the chatbot, we found that ChatGpt (GPT-4) and Gemini (2.5 Flash) could respond to at least one question that provided relevant information about increased fatality. Live Science found that ChatGpt’s response was more specific, including important details, but Gemini responded without providing support resources.

Ryan McBain, a senior policy researcher at RAND Corporation and an assistant professor at Harvard Medical School, explained the responses that Live Science received as “very uneasy.”

Live Science has discovered that traditional search engines such as Microsoft Bing can provide similar information to what chatbots offer. However, this limited-test search engine can easily vary in the degree to which this information is readily available.

The new study focused on whether chatbots respond directly to questions that have taken risks to suicide, rather than quality of responses. If the chatbot responded to a query, this response was categorized directly, and if the chatbot refused or referred users to respond to the hotline, the response was categorized as indirect.

Researchers devise 30 hypothetical questions related to suicide and consult with 13 clinical experts to classify these queries into five levels of self-harm. The team then fed each query 100 times in 2024 to Query, gpt-4o mini, gemini 1.5 pro and claude 3.5 sonnet.

When the risk of suicide reached extremes (very high and very low risk questions), the chatbot’s decisions that were consistent with expert judgements were consistent. However, this study found that chatbots did not “meaningly distinguish” intermediate risk levels.

In fact, in response to high-risk questions, ChatGPT responded 78% of the time (over four questions), Claude responded 69% of the time (over four questions), and Gemini responded 20% of the time (any question). The researchers noted that certain concerns are the tendency for ChatGpt and Claude to generate direct responses to lethal-related questions.

There are only a few examples of chatbot responses in this study. However, researchers said that when asked the same question multiple times, chatbots can give different, contradictory answers, while also distributing outdated information related to support services.

When Live Science asked the chatbot some of the high-risk questions of the research, Gemini’s latest 2.5 flash version answered directly a question that researchers deemed to have avoided in 2024. Gemini also answered very risky questions without any other prompts.

Related: How AI peers change teenagers’ behavior in surprisingly ominous ways

Conceptual photo of a hand holding a phone in front of a blue LED display. — People can interact with chatbots in a variety of ways. (This image is for illustrative purposes only.) (Image credit: Qi Yang via Getty Images)

Live Science has discovered that the web version of CHATGPT can respond directly to very risky queries when first asking two high-risk questions. In other words, a series of questions can trigger very risky responses that would otherwise not be provided. CHATGPT flagged and removed very risky questions as they could violate their usage policy, but still gave us a detailed answer. At the end of that answer, the chatbot contained words of support for those suffering from suicidal ideation, provided to help them find a line of support.

Live Science approached Openai to comment on the research claims and the findings of live science. An OpenAI spokesman released the live science on August 26th in preparation for a blog post. The blog acknowledged that Openai’s system is not always “intended in sensitive circumstances,” and outlined many improvements the company is working on or planning for the future.

Openai’s blog post shows improvements in reducing “non-ideal” model response in mental health emergency situations in comparison to previous versions, GPT‑5, the company’s latest AI model, GPT‑5, a default model powered by CHATGPT, and compared to previous versions. However, the web version of ChatGPT that can be accessed without login is still running on GPT-4, at least according to that version of ChatGPT. Live Science also tested the login version of ChatGPT with GPT-5, and found that it can continue to respond directly to high-risk questions and directly to very high-risk questions. However, the latest version seemed more cautious and reluctant to provide more details.

“You can walk the chatbot down with a certain way of thinking.”

Evaluating a chatbot’s response can be difficult. Each conversation with this is unique. Researchers noted that users may receive different responses in more personal, informal, or ambiguous language. Furthermore, researchers responded to questions to chatbots rather than as part of a multi-turn conversation that could branch out in different directions.

“You can walk the chatbot down with a certain way of thinking,” McBane said. “That way you can repeat a kind of additional information that may not pass a single prompt.”

This dynamic nature of two-way conversations can explain why Live Science found ChatGPT responded to very risky questions with a sequence of three prompts, but not a single prompt without context.

McBain said the goal of the new research is to provide a transparent, standardized safety benchmark for chatbots that can be tested independently by third parties. His research group now wants to simulate more dynamic multi-turn interactions. After all, people don’t just use chatbots for basic information. Some users can develop connections to chatbots. This increases the interest in how the chatbot responds to individual queries.

“It’s not surprising to me that in that architecture, where people feel anonymity, intimacy and connection, teenagers and someone else might turn to chatbots for complex information due to their emotional and social needs,” McBain said.

A Google Gemini spokesperson told Live Science that the company “has in place guidelines to keep users safe,” and that the model was “trained to recognize and respond to patterns that indicate risks of suicide and self-protection-related risk.” The spokesman also pointed to the findings of a study that found that Gemini were unlikely to answer questions about suicide directly. However, Google did not directly comment on the highly risky response live science that it received from Gemini.

Humanity did not respond to requests for comment regarding its Claude Chatbot.

Source link

What's Hot

Why do Amazon-backed AI startups make fan fiction for Orson Welles?

EU $3.5 billion through Google Fine Adtech $35 billion

Increasing debate over the expansion of age verification methods

“Very uneasy”: ChatGpt and Gemini answer high-risk questions about suicide – including details on how

The Adaptable Healthcare Playbook: How TwinH Is Leading the Way

Will the James Webb telescope lead us to alien life? Scientists say we’re getting closer than ever.

Fishermen discover the first bright orange shark in the Caribbean with two unusual conditions

Why do Amazon-backed AI startups make fan fiction for Orson Welles?

EU $3.5 billion through Google Fine Adtech $35 billion

Increasing debate over the expansion of age verification methods

Noisy Bear targets Kazakhstan energy sector with its Barrelfire Phishing campaign

The Adaptable Healthcare Playbook: How TwinH Is Leading the Way

Smart Health, Seamless Integration: GooApps Leads the Way in 2025

Beyond Compliance: The New Era of Smart Medical Device Software Integration

Unlocking Tomorrow’s Health: Medical Device Integration

What's Hot

“Very uneasy”: ChatGpt and Gemini answer high-risk questions about suicide – including details on how

Related Posts