According to scientists, in a summary of scientific research, chatbots are shiny for critical details

Large-scale language models (LLMs) are oversimplified with each new version, and in some cases misrepresent important scientific and medical findings, making them almost “intelligent” as new research is discovered.

Scientists have found that ChatGpt, Llama, and Deepseek versions are five times more likely to oversimplify scientific discoveries than human experts in their analysis of summaries of 4,900 research papers.

Given an accuracy prompt, chatbots were twice as likely to overly normalize their findings than when asked for a brief summary. The test also revealed an increase in excess generalization between new chatbot versions compared to previous generations.

Unsafe treatment options

In the new study, researchers worked to answer three questions about the 10 most popular versions of LLM: four versions of ChatGPT, three versions of Claude, two versions of Llama, and one of Deepseek.

When they were encouraged to present and summarise human summary of articles in the Academic Journal, LLM overstated the summary and wanted to see if seeking a more accurate answer would have better results. The team also sought to find out if LLMs are overdoing more than humans.

The findings revealed that LLMS except Claude, which worked well on all test criteria, was twice as likely to produce an overgeneralized result given a prompt for accuracy. The LLM summary could be nearly five times higher than the human-generated summary for rendering generalized conclusions.

The researchers also noted that LLMS is most common overcombined overloaded by quantified data to generic information, and is most likely to create unsafe treatment options.

These transitions and overgeneralization have led to bias, according to experts at the intersection of AI and healthcare.

“The study emphasizes that bias can also take a more subtle form, like quiet inflation in the scope of bias,” Max Rollwage, vice president of AI and Limbic, a clinical mental health AI technology company, told Live Science in an email. “In domains like medicine, LLM summary is already a routine part of the workflow. This makes it even more important to look at how these systems are run and whether the output is reliable to faithfully represent the original evidence.”

Such findings should encourage developers to create workflow guardrails that identify excessive replicating and omissions of critical information before leaving their findings in the hands of public or expert groups, Rollwage said.

Comprehensively, this study had limitations. Future research will benefit from expanding the test to other scientific tasks and non-English texts. It also benefits from testing which types of scientific claims are affected by overgeneralization, says Patricia Thaine, co-founder and CEO of AI development company.

Rollwage also said “deep, faster engineering analyses could have improved or clarified results,” and Peters sees greater risks on the horizon as their reliance on chatbots grows.

“Tools like ChatGpt, Claude, and Deepseek have become part of how people understand scientific discoveries,” he writes. “As their use continues to grow, this poses a real risk of massive misconceptions of science at a moment when public trust and scientific literacy are already under pressure.”

For other experts in this field, the challenge we face is ignoring specialized knowledge and protection.

“Models are trained in simplified scientific journalism, not or in addition to the primary sources of information inheriting those overstatements,” Thaine wrote in Live Science.

“However, what’s important is that we are applying a generic model to specialized domains without the supervision of appropriate experts. This is a fundamental misuse of technology that requires more task-specific training.”

Source link

What's Hot

Microsoft Link Exploits to 3 Chinese Hacker Groups in SharePoint ongoing

Google and Microsoft say Chinese hackers are using SharePoint Zero-Day

Cisco checks active exploits targeting defects in ISE and allows for unrecognized root access

According to scientists, in a summary of scientific research, chatbots are shiny for critical details

China launches the world’s first robot that can be run 24/7 – Watch it change your battery with new unstable footage

Shark Week team discovers the rare “Blackmacos” off the coast of California

3 hours from New York to Los Angeles? Executive Order could make it possible by 2027, allowing doors to be reopened for commercial supersonic flights

Microsoft Link Exploits to 3 Chinese Hacker Groups in SharePoint ongoing

Google and Microsoft say Chinese hackers are using SharePoint Zero-Day

Cisco checks active exploits targeting defects in ISE and allows for unrecognized root access

Betaworks’ third fund will close at $66 million and invest in early stage AI startups

Is ‘Baby Grok’ the Future of Kids’ AI? Elon Musk Launches New Chatbot

Next-Gen Digital Identity: How TwinH and Avatars Are Redefining Creation

BREAKING: TwinH Set to Revolutionize Legal Processes – Presented Today at ICEX Forum 2025

Building AGI: Zuckerberg Commits Billions to Meta’s Superintelligence Data Center Expansion

What's Hot

According to scientists, in a summary of scientific research, chatbots are shiny for critical details

Unsafe treatment options

Related Posts