Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

N. Korea’s hackers have stolen millions of people using cryptography using job lures, cloud account access and malware

Prenatal PFA exposure disrupts infant immunity development

Google is experimenting with machine learning power age estimation technology in the US

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » The new AI model meta benchmark is a bit misleading
Startups

The new AI model meta benchmark is a bit misleading

userBy userApril 6, 2025No Comments2 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

One of the new flagship AI model meta released on Saturday, Maverick ranks second in the LM Arena. This is a test in which a human evaluator compares the output of the model and selects preferences. However, the version of Maverick that Meta deployed in LM Arena appears to be different from the version widely available to developers.

As some AI researchers pointed out in X, Meta said that Maverick of LM Arena has announced that it is an “experimental chat version.” Meanwhile, the chart on the official Llama website reveals that Meta’s LM Arena test was conducted using “Llama 4 Maverick optimized for conversation.”

As I wrote before, for a variety of reasons, LM arena was not the most reliable measure of AI models’ performance. However, AI companies generally do not customize or tweak their models, or at least allow them to do so, in order to score better at LM Arena.

The problem with adjusting the model to its benchmark, withholding it, then releasing a “vanilla” variant of the same model is that it becomes difficult for developers to accurately predict the performance of the model in a given context. That’s also misleading. Ideally, the benchmark is as badly insufficient as it is – providing a snapshot of the advantages and disadvantages of a single model across a variety of tasks.

In fact, X researchers have observed significant differences in the behavior of publicly available Mavericks compared to models hosted at LM Arena. The LM Arena version seems to use a lot of emojis and provide a very long answer.

OK llama4 is a lol with def cooked.

– Nathan Lambert (@natolambert) April 6, 2025

For some reason, the Arena Lama 4 model uses more emojis

together. ai, it seems better: pic.twitter.com/f74odx4ztt

– Tech Dev Notes (@techdevnotes) April 6, 2025

For comments, we contacted Chatbot Arena with Meta, the organization that maintains LM Arena.




Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleYemen’s Hootis says the latest US attacks kill at least four at Sanaa | News
Next Article Pennylane doubles its valuation as Alphabet VC Fund acquires shares
user
  • Website

Related Posts

Google is experimenting with machine learning power age estimation technology in the US

July 31, 2025

Proton releases new apps for two-factor authentication

July 31, 2025

Germ brings end-to-end encrypted messages to BlueSky

July 30, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

N. Korea’s hackers have stolen millions of people using cryptography using job lures, cloud account access and malware

Prenatal PFA exposure disrupts infant immunity development

Google is experimenting with machine learning power age estimation technology in the US

2025 What Gartner® MagicQuadrant™ reveals

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

New Internet Era: Berners-Lee Sets the Pace as Zuckerberg Pursues Metaverse

TwinH Transforms Belgian Student Life: Hendrik’s Journey to Secure Digital Identity

Tim Berners-Lee Unveils the “Missing Link”: How the Web’s Architect Is Building AI’s Trusted Future

Dispatch from London Tech Week: Keir Starmer, The Digital Twin Boom, and FySelf’s Game-Changing TwinH

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.