Why Openai’s solution to AI Hallucinations kills ChatGpt tomorrow

Openai’s latest research paper accurately diagnoses why ChatGpt and other large-scale language models can form things. It is known in the world of artificial intelligence as “hagaku”. It also reveals why the issue is impossible, at least as far as consumers are concerned.

This paper provides the most stringent mathematical explanation of why these models are confidently lying. These show that AIS is mathematically inevitable, rather than the unfortunate side effects of the methods currently being trained.

Evaluation Trap

Even more troublesome is the analysis of the papers on why hallucinations persist despite post-training efforts (such as providing extensive human feedback on AI responses before AI responses are published). The authors have reviewed 10 major AI benchmarks, including Google, Openai, and top leaderboards ranking AI models. This revealed that nine benchmarks use a binary grading system that awards zero points to AISs that express uncertainty.

This creates what the author calls “trends” that punishes honest reactions. If your AI system says “I don’t know,” you will receive the same score as providing completely incorrect information. The best strategy under such assessments will become clear: always guess.

A solution that breaks everything

The revisions proposed by Openai are to have AI consider their own confidence in the answer before bringing it out, and benchmarks will grade them based on it. For example, AI will answer “Only if you are confident over 75%. Please answer as mistakes will punish you 3 points and correct answer will receive 1 point.”

The mathematical framework of Openai researchers shows that, under appropriate trust thresholds, AI systems naturally express uncertainty rather than guesses. Therefore, this reduces hallucinations. The question is what to do with the user experience.

Let’s consider what it means if ChatGpt even says “I don’t know” at 30% of the queries. Users who are used to being answered with confidence to virtually any question may quickly abandon such a system.

I have seen this kind of problem in another area of my life. I am participating in an Aviation Quality Surveillance Project in Salt Lake City, Utah. If the system flags uncertainties regarding measurements during harmful weather conditions or measurements during measurements or instrument adjustments, there is less user engagement compared to displays showing confident measurements, even if the confident measurements have been proven to be inaccurate during verification.

Computational Economics Problems

Using paper insights to reduce hallucinations is not difficult. Established methods for quantifying uncertainty have existed for decades. These can be used to provide reliable estimates of uncertainty and guide AI to make smarter choices.

However, even if users can overcome this uncertainty problem, there is a greater obstacle. It’s computational economics. Uncertain language models require significantly more computation than today’s approaches, as they require multiple possible responses to be evaluated and confidence levels to be estimated. For systems that process millions of queries every day, this translates dramatically to higher operating costs.

More sophisticated approaches such as active learning, where AI systems clarify questions to reduce uncertainty, improve accuracy, but can further increase computational requirements. Such methods work well in specialized domains such as chip designs where incorrect answers cost millions of dollars and justify extensive calculations. Economics is outrageous for consumer applications where users expect instant responses.

Calculus shifts dramatically for AI systems that manage critical business operations or economic infrastructure. When AI agents handle supply chain logistics, financial transactions, or medical diagnosis, the cost of hallucination far outweighs the cost of determining whether the model is uncertain. In these domains, paper proposed solutions will become economically viable and even necessary. An uncertain AI agent only costs more.

However, consumer applications still dominate AI development priorities. Users want a system that provides confident answers to questions. Evaluation benchmarks reward systems that reward systems that infer rather than express uncertainty. Computational costs prefer fast and overconfident responses over slow and uncertainty.

AI-Analyzed Energy Consumption Abstract Concept Vector Illustration. — (Image credit: Andrei Krauchuk/Shutterstock)

With lower energy costs and forging chip architectures with each token, it could be more affordable to have AIS decide whether they are sure enough to answer the questions. However, regardless of absolute hardware costs, the relatively large calculations required compared to today’s guess remain.

In short, Openai papers highlight inadvertently offensive truths. Business incentives driving the development of consumer AI are fundamentally contradictory by reducing hallucinations. The hallucinations persist until these incentives change.

This edited article will be republished from the conversation under a Creative Commons license. Please read the original article.

Source link

What's Hot

Chinese threat group Jewelbug secretly infiltrated Russian IT networks for months

Eightfold Co-Founder Raises $35M for Viven, AI Digital Twin Startup Contacts Missed Colleagues

F5 breach exposes BIG-IP source code — state hackers behind massive intrusion

Why Openai’s solution to AI Hallucinations kills ChatGpt tomorrow

Diagnostic dilemma: Huge lump in woman’s stomach was likely caused by Ozempic-type drugs, dissolved with diet soda

Viral ‘Chicago rat hole’ wasn’t actually created by rats, scientists claim

Haunting images of rare hyenas lurking in ghost towns win 2025 Wildlife Photographer of the Year Award

Chinese threat group Jewelbug secretly infiltrated Russian IT networks for months

Eightfold Co-Founder Raises $35M for Viven, AI Digital Twin Startup Contacts Missed Colleagues

F5 breach exposes BIG-IP source code — state hackers behind massive intrusion

The AI Revolution: Beyond Superintelligence – TwinH Leads the Charge in Personalized, Secure Digital Identities

The AI Revolution: Beyond Superintelligence – TwinH Leads the Charge in Personalized, Secure Digital Identities

Revolutionize Your Workflow: TwinH Automates Tasks Without Your Presence

FySelf’s TwinH Unlocks 6 Vertical Ecosystems: Your Smart Digital Double for Every Aspect of Life

Beyond the Algorithm: How FySelf’s TwinH and Reinforcement Learning are Reshaping Future Education

What's Hot

Why Openai’s solution to AI Hallucinations kills ChatGpt tomorrow

Evaluation Trap

A solution that breaks everything

Computational Economics Problems

Related Posts