Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
What's Hot

Axiom Space is preparing for its fourth mission to the ISS

How to watch Apple’s WWDC 2025 Keynote

In WWDC 25, AI must compensate with developers after AI shortage and lawsuits

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
Fyself News
Home » Meta Exec rejects the company’s artificially boosted benchmark score for Llama4
Startups

Meta Exec rejects the company’s artificially boosted benchmark score for Llama4

userBy userApril 7, 2025No Comments2 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

On Monday, the meta-executive denied rumours that they had adjusted new AI models suitable for specific benchmarks, hiding the weaknesses of the model.

Ahmad al-Dar, vice president of Meta Generation AI, said in X’s post that Meta trained the Rama 4 Maverick and the Rama 4 Scout model in the “test set.” In AI benchmarks, a test set is a collection of data used to evaluate performance after the model has been trained. Training on a test set can mislead and inflate the model’s benchmark scores, which can make the model more capable than it actually is.

Over the weekend, unfounded rumors began to circulate on X and Reddit that Meta artificially increased the benchmark results of the new model. The rumor appears to have stemmed from a post on a Chinese social media site from users who claimed they had resigned from Meta in protest of the company’s benchmark practices.

Maverick and Scout have driven rumors as reports of poor performance on certain tasks. This promoted rumors, as well as Meta’s decision to use an experimental and unpublished version of Maverick to achieve better scores at the benchmark LM arena. X researchers have observed significant differences in the behavior of publicable Mavericks compared to models hosted at LM Arena.

Al-Dahle has admitted that some users see “mixed quality” from Maverick and Scouts at various cloud providers that host the models.

“We dropped as soon as the model was ready, so we expect it will take several days for all public implementations to be dialed,” says Al-Dahle. “We continue to work through bug fixes and onboarding partners.”


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleFrom .ai to .com: Quiet Domain Rebranded Sweep Startup Ecosystem
Next Article MSF finds malnourished children in Greek immigration camps and encourages action | Transition News
user
  • Website

Related Posts

Axiom Space is preparing for its fourth mission to the ISS

June 8, 2025

How to watch Apple’s WWDC 2025 Keynote

June 8, 2025

In WWDC 25, AI must compensate with developers after AI shortage and lawsuits

June 8, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Axiom Space is preparing for its fourth mission to the ISS

How to watch Apple’s WWDC 2025 Keynote

In WWDC 25, AI must compensate with developers after AI shortage and lawsuits

New supply chain malware operations hit the NPM and PYPI ecosystems, targeting millions around the world

Trending Posts

Sana Yousaf, who was the Pakistani Tiktok star shot by gunmen? |Crime News

June 4, 2025

Trump says it’s difficult to make a deal with China’s xi’ amid trade disputes | Donald Trump News

June 4, 2025

Iraq’s Jewish Community Saves Forgotten Shrine Religious News

June 4, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Should the government ban AI-generated humans to stop the collapse of social trust?

AB will be released at Binance -Tech Startups

Top 10 Startups and Tech Funding News for the Weekly Ends June 6, 2025

Order openai to keep all chatgpt logs including deleted temporary chats, API requests

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.