Meta Exec rejects the company’s artificially boosted benchmark score for Llama4

On Monday, the meta-executive denied rumours that they had adjusted new AI models suitable for specific benchmarks, hiding the weaknesses of the model.

Ahmad al-Dar, vice president of Meta Generation AI, said in X’s post that Meta trained the Rama 4 Maverick and the Rama 4 Scout model in the “test set.” In AI benchmarks, a test set is a collection of data used to evaluate performance after the model has been trained. Training on a test set can mislead and inflate the model’s benchmark scores, which can make the model more capable than it actually is.

Over the weekend, unfounded rumors began to circulate on X and Reddit that Meta artificially increased the benchmark results of the new model. The rumor appears to have stemmed from a post on a Chinese social media site from users who claimed they had resigned from Meta in protest of the company’s benchmark practices.

Maverick and Scout have driven rumors as reports of poor performance on certain tasks. This promoted rumors, as well as Meta’s decision to use an experimental and unpublished version of Maverick to achieve better scores at the benchmark LM arena. X researchers have observed significant differences in the behavior of publicable Mavericks compared to models hosted at LM Arena.

Al-Dahle has admitted that some users see “mixed quality” from Maverick and Scouts at various cloud providers that host the models.

“We dropped as soon as the model was ready, so we expect it will take several days for all public implementations to be dialed,” says Al-Dahle. “We continue to work through bug fixes and onboarding partners.”

Source link

What's Hot

Post-Quantum Cryptography Webinar for Security Leaders

Dust Specter targets Iraqi officials with new SPLITDROP and GHOSTFORM malware

Large-scale language model for Estonia’s sovereign AI infrastructure

Meta Exec rejects the company’s artificially boosted benchmark score for Llama4

Anthropic CEO Dario Amodei calls OpenAI’s message about military agreement a ‘blatant lie,’ report says

Google settles with Epic Games, lowers Play Store fees to 20%

MacBook Neo, iPhone 17e, and everything else Apple announced this week

Post-Quantum Cryptography Webinar for Security Leaders

Dust Specter targets Iraqi officials with new SPLITDROP and GHOSTFORM malware

Large-scale language model for Estonia’s sovereign AI infrastructure

Where multi-factor authentication stops and credential abuse begins

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

Meta Exec rejects the company’s artificially boosted benchmark score for Llama4

Related Posts