Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
What's Hot

Chrome 0-Day, Data Wipers, Misused Tools and Zero-Click iPhone Attacks

70%: Laura Rosinska -Conference News

Green transitions must be PFAS-free

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
Fyself News
Home » Openai’s O3AI model lowers the score on the benchmarks lower than the company initially suggested
Startups

Openai’s O3AI model lowers the score on the benchmarks lower than the company initially suggested

userBy userApril 20, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The inconsistency between first-party and third-party benchmark results of OpenAI’s O3 AI models raises questions about the company’s transparency and model testing practices.

When Openai unveiled the O3 in December, the company claimed that the model could answer more than a quarter of questions about Frontiermath. That score blew the competition – the next best model answered only about 2% of the Frontiermath problem correctly.

“All products out there today are under 2% [on FrontierMath]Openrai’s chief research officer, Mark Chen said during the live stream. “We’re watching [internally]O3 is in aggressive test time calculation settings, so it can exceed 25%. ”

After all, that figure is probably a cap, achieved by the version of O3, which has more computing than the model Openai, which was released last week.

Epoch AI, the laboratory behind Frontiermath, announced the results of the O3’s independent benchmark test on Friday. Epoch found that O3 scored around 10% well below Openai’s highest billing score.

Openai has released the highly anticipated inference model, O3, along with the O4-Mini, a smaller and cheaper model that takes over the O3-Mini.

We evaluated a new model of a set of mathematics and science benchmarks. Thread results! pic.twitter.com/5gbtzkey1b

– Epoch AI (@epochairesearch) April 18, 2025

That doesn’t mean that in itself was a lie. The company-issued benchmark results published in December show lower bound scores that match the observed score epoch. Epoch also said that the setup for that test would likely be different from the setup for Openai, and that it used the updated release of Frontiermath for its evaluation.

“The difference between our results and Openai could be due to the fact that Openai is evaluated with a stronger internal scaffold and use more testing time [computing]or because these results were performed on different subsets of Frontiermath (290 issues with Frontiermath-2024-11-26 vs Frontiermath-2025-02-28-Private),” Epoch wrote.

According to a post from X from the ARC Awards Foundation, the organization that tested the pre-release version of the O3, the public O3 model is “a different model.” […] We will tailor it to your chat/product use,” confirms Epoch’s report.

“All released O3 computing layers are smaller than our version [benchmarked]wrote the ARC Award. Generally speaking, you can expect a larger computing layer to achieve a better benchmark score.

Certainly, the fact that the O3’s public release has not reached Openai’s testing promise means that Frontiermath’s O3-Mini-High and O4-Mini models outperform the O3, so OpenAI will be debuting a stronger O3 variant, the O3-Pro, in the coming weeks.

However, remind yourself that AI benchmarks are best not taken at face value, especially if you are a company that has a service that the source sells.

As vendors compete to capture headlines and mindshares with new models, benchmark “controversy” is becoming a common occurrence in the AI ​​industry.

In January, Epoch was criticized for waiting for the company to disclose funds from Openai until after it announced the O3. Many scholars who contributed to Frontiermath were not informed of Openai’s involvement until it was published.

Recently, Elon Musk’s Xai has been accused of publishing a misleading benchmark chart for its latest AI model, the Grok 3. This month, Meta confirmed that the company will promote benchmark scores for versions of models that are different from those available to developers.




Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleIsraeli military will only find “special failures” in killing Gaza aid workers | Israeli-Palestinian conflict news
Next Article Police investigate shootings on subway platforms that led to Harvard students evacuation
user
  • Website

Related Posts

Axiom Space is preparing for its fourth mission to the ISS

June 8, 2025

How to watch Apple’s WWDC 2025 Keynote

June 8, 2025

In WWDC 25, AI must compensate with developers after AI shortage and lawsuits

June 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Chrome 0-Day, Data Wipers, Misused Tools and Zero-Click iPhone Attacks

70%: Laura Rosinska -Conference News

Green transitions must be PFAS-free

Strata’s Matthew Cole on the changing creative landscape and the pace of change

Trending Posts

Sana Yousaf, who was the Pakistani Tiktok star shot by gunmen? |Crime News

June 4, 2025

Trump says it’s difficult to make a deal with China’s xi’ amid trade disputes | Donald Trump News

June 4, 2025

Iraq’s Jewish Community Saves Forgotten Shrine Religious News

June 4, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Founders of Layerzero, SEI, Selini Capital and Plume Back Hyper-Personalized AI Crypto Discovery Engine

Should the government ban AI-generated humans to stop the collapse of social trust?

AB will be released at Binance -Tech Startups

Top 10 Startups and Tech Funding News for the Weekly Ends June 6, 2025

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.