Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

Chinese threat group Jewelbug secretly infiltrated Russian IT networks for months

Eightfold Co-Founder Raises $35M for Viven, AI Digital Twin Startup Contacts Missed Colleagues

F5 breach exposes BIG-IP source code — state hackers behind massive intrusion

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » Why Openai’s solution to AI Hallucinations kills ChatGpt tomorrow
Science

Why Openai’s solution to AI Hallucinations kills ChatGpt tomorrow

userBy userSeptember 27, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Openai’s latest research paper accurately diagnoses why ChatGpt and other large-scale language models can form things. It is known in the world of artificial intelligence as “hagaku”. It also reveals why the issue is impossible, at least as far as consumers are concerned.

This paper provides the most stringent mathematical explanation of why these models are confidently lying. These show that AIS is mathematically inevitable, rather than the unfortunate side effects of the methods currently being trained.

This problem can be explained in part by mistakes in the underlying data used to train AIS. However, using mathematical analysis of AI systems’ learning methods, researchers have proven that even with perfect training data, problems still exist.

You might like it

How language models respond to queries – naturally generate errors by predicting one word at a time in a sentence based on probability. In fact, researchers have shown that the total error rate for generating statements is at least twice as high as the error rate for the same error rate with a simple YES/NO suspicion, as mistakes can accumulate in multiple predictions.

In other words, hallucination rates are fundamentally limited by the extent to which AI systems can distinguish between invalid responses and effectiveness. Hallucinations are inevitable as this classification problem is inherently difficult for many knowledge areas.

We also see that the less facts the model sees during training, the more likely it is to hallucinate when asked about it. For example, in a birthday of a notable number, if 20% of such people’s birthdays are only shown once in the training data, the base model should make a mistake at least 20% of the birthday queries.

Related: “Not pushing that genie back into the bottle”: readers believe it’s too late to stop the progression of AI

Get the world’s most engaging discoveries delivered straight to your inbox.

Sure enough, when researchers asked about the cutting edge model for one of the paper’s authors, Adam Karai, the birthday, deepseek-v3 confidently provided three different false dates in separate attempts: “03-07”, “15-06”, and “01-01”. The correct date was in autumn, so none of these were nearby.

Evaluation Trap

Even more troublesome is the analysis of the papers on why hallucinations persist despite post-training efforts (such as providing extensive human feedback on AI responses before AI responses are published). The authors have reviewed 10 major AI benchmarks, including Google, Openai, and top leaderboards ranking AI models. This revealed that nine benchmarks use a binary grading system that awards zero points to AISs that express uncertainty.

This creates what the author calls “trends” that punishes honest reactions. If your AI system says “I don’t know,” you will receive the same score as providing completely incorrect information. The best strategy under such assessments will become clear: always guess.

You might like it

Humanoid AI robots talking together in psychotherapy sessions, concepts of artificial psychology

“I have as many crazy guesses as you like.” (Image credit: ELENABSL/SHUTTERSTOCK)

Researchers have proven this mathematically. Whatever the possibility that a particular answer is correct, if the rating uses binary grading, the expected score of the guess will always exceed the score of abstention.

A solution that breaks everything

The revisions proposed by Openai are to have AI consider their own confidence in the answer before bringing it out, and benchmarks will grade them based on it. For example, AI will answer “Only if you are confident over 75%. Please answer as mistakes will punish you 3 points and correct answer will receive 1 point.”

The mathematical framework of Openai researchers shows that, under appropriate trust thresholds, AI systems naturally express uncertainty rather than guesses. Therefore, this reduces hallucinations. The question is what to do with the user experience.

Let’s consider what it means if ChatGpt even says “I don’t know” at 30% of the queries. Users who are used to being answered with confidence to virtually any question may quickly abandon such a system.

I have seen this kind of problem in another area of ​​my life. I am participating in an Aviation Quality Surveillance Project in Salt Lake City, Utah. If the system flags uncertainties regarding measurements during harmful weather conditions or measurements during measurements or instrument adjustments, there is less user engagement compared to displays showing confident measurements, even if the confident measurements have been proven to be inaccurate during verification.

Computational Economics Problems

Using paper insights to reduce hallucinations is not difficult. Established methods for quantifying uncertainty have existed for decades. These can be used to provide reliable estimates of uncertainty and guide AI to make smarter choices.

However, even if users can overcome this uncertainty problem, there is a greater obstacle. It’s computational economics. Uncertain language models require significantly more computation than today’s approaches, as they require multiple possible responses to be evaluated and confidence levels to be estimated. For systems that process millions of queries every day, this translates dramatically to higher operating costs.

More sophisticated approaches such as active learning, where AI systems clarify questions to reduce uncertainty, improve accuracy, but can further increase computational requirements. Such methods work well in specialized domains such as chip designs where incorrect answers cost millions of dollars and justify extensive calculations. Economics is outrageous for consumer applications where users expect instant responses.

Calculus shifts dramatically for AI systems that manage critical business operations or economic infrastructure. When AI agents handle supply chain logistics, financial transactions, or medical diagnosis, the cost of hallucination far outweighs the cost of determining whether the model is uncertain. In these domains, paper proposed solutions will become economically viable and even necessary. An uncertain AI agent only costs more.

However, consumer applications still dominate AI development priorities. Users want a system that provides confident answers to questions. Evaluation benchmarks reward systems that reward systems that infer rather than express uncertainty. Computational costs prefer fast and overconfident responses over slow and uncertainty.

AI-Analyzed Energy Consumption Abstract Concept Vector Illustration.

(Image credit: Andrei Krauchuk/Shutterstock)

With lower energy costs and forging chip architectures with each token, it could be more affordable to have AIS decide whether they are sure enough to answer the questions. However, regardless of absolute hardware costs, the relatively large calculations required compared to today’s guess remain.

In short, Openai papers highlight inadvertently offensive truths. Business incentives driving the development of consumer AI are fundamentally contradictory by reducing hallucinations. The hallucinations persist until these incentives change.

This edited article will be republished from the conversation under a Creative Commons license. Please read the original article.


Source link

#Biotechnology #ClimateScience #Health #Science #ScientificAdvances #ScientificResearch
Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleBeware of colleagues who produce AI-generated “workslops”
Next Article Do figs really have dead hornets?
user
  • Website

Related Posts

Diagnostic dilemma: Huge lump in woman’s stomach was likely caused by Ozempic-type drugs, dissolved with diet soda

October 15, 2025

Viral ‘Chicago rat hole’ wasn’t actually created by rats, scientists claim

October 14, 2025

Haunting images of rare hyenas lurking in ghost towns win 2025 Wildlife Photographer of the Year Award

October 14, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Chinese threat group Jewelbug secretly infiltrated Russian IT networks for months

Eightfold Co-Founder Raises $35M for Viven, AI Digital Twin Startup Contacts Missed Colleagues

F5 breach exposes BIG-IP source code — state hackers behind massive intrusion

The AI Revolution: Beyond Superintelligence – TwinH Leads the Charge in Personalized, Secure Digital Identities

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

The AI Revolution: Beyond Superintelligence – TwinH Leads the Charge in Personalized, Secure Digital Identities

Revolutionize Your Workflow: TwinH Automates Tasks Without Your Presence

FySelf’s TwinH Unlocks 6 Vertical Ecosystems: Your Smart Digital Double for Every Aspect of Life

Beyond the Algorithm: How FySelf’s TwinH and Reinforcement Learning are Reshaping Future Education

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.