Meta’s Vanilla Maverick AI Model ranks under rivals in the popular chat benchmark

Earlier this week, Meta landed in warm water to achieve a high score on the crowdsourced benchmark LM Arena using an experimental, unpublished version of the Llama 4 Maverick model. The incident prompted LM Arena maintainers to apologise, change their policies and acquire the unchanged vanilla maverick.

After all, it’s not very competitive.

The unfixed Maverick “Llama-4-Maverick-17B-128e-Instruct” was ranked under the models that included As As As As Friday’s As As As As As As Andopenai’s GPT-4O, Openai’s GPT-4O, and Google’s Gemini 1.5 Pro. Many of these models were a few months ago.

The release version of Llama 4 was added to Lmarena after it was discovered they had cheated, but you probably didn’t see it as you have to scroll to 32nd place.

– ρ:eeσn (@pigeon__s) April 11, 2025

Why is the performance poor? Meta’s experimental Maverick, Lama-4-Maverick-03-26-Experimmal, explained in a chart released last Saturday. These optimizations clearly worked well for LM arenas where human evaluators compare the outputs of the models and select what they like.

As I wrote before, for a variety of reasons, LM arena was not the most reliable measure of AI models’ performance. Still, tuning your model to your benchmark is not only misleading, but it also makes it difficult for developers to accurately predict how well a model will work in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta will experiment with “all kinds of custom variants.”

“‘llama-4-maverick-03-26-Experimmal’ is a chat-optimized version that also works well in the LM arena,” the spokesman said. “We are currently releasing an open source version and see how developers can customize Llama 4 for their use cases. We look forward to seeing what they build and ongoing feedback.”

Source link

What's Hot

Merlin, a common roadside duck in Mexico City, will be the World Cup mascot.

BTS is the group fans are most looking forward to seeing perform at the 2026 World Cup

Swimming Pole, Billboard’s Emerging Dance Artist of the Month

Meta’s Vanilla Maverick AI Model ranks under rivals in the popular chat benchmark

Best Robot Lawn Mower Deal: 45% Off Sunseeker S4 Robot Lawn Mower

Jalen Brunson’s mindset is Virgo’s peak behavior

The most frustrating part of dating apps in 2026

Merlin, a common roadside duck in Mexico City, will be the World Cup mascot.

BTS is the group fans are most looking forward to seeing perform at the 2026 World Cup

Swimming Pole, Billboard’s Emerging Dance Artist of the Month

Best Photos of Music Performances

BTS is the group fans are most looking forward to seeing perform at the 2026 World Cup

Swimming Pole, Billboard’s Emerging Dance Artist of the Month

Best Photos of Music Performances

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

Meta’s Vanilla Maverick AI Model ranks under rivals in the popular chat benchmark

Related Posts