Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
What's Hot

Trump administration cuts another $450 million with Harvard grants

Florida students have been accused of remaining in jail for massive shootings on campus

Government email alert system Govdelivery is used to send fraud messages

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Español
    • Português
Fyself News
Home » High school students have built a website that allows them to challenge AI models to accumulate Minecraft
Startups

High school students have built a website that allows them to challenge AI models to accumulate Minecraft

userBy userMarch 20, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

As traditional AI benchmarking technologies prove to be insufficient, AI builders are turning to more creative ways to assess the capabilities of generated AI models. For one group of developers, it’s Minecraft, a sandbox building game owned by Microsoft.

The Minecraft Benchmark (or MC-Bench) website was developed in collaboration to attack AI models against each other to respond to Minecraft Creations prompts. Users can vote on which models did a better job. Only after voting can you see which AIs have created each Minecraft build.

Image credit: Minecraft benchmark (Opens in a new window)

For Adi Singh, a 12th grader who started the MC bench, the value of Minecraft is not the game itself, but the familiarity people have about it. Even those who haven’t played the game can still assess which blockade expression of pineapple is better realized.

“Minecraft allows people to see progress [of AI development] Singh tells Techcrunch.

The MC Bench currently lists eight volunteer contributors. Anthropic, Google, Openai, and Alibaba have subsidized the use of the project’s products to run benchmark prompts on a per-MC bench website, but companies are not affiliated with them otherwise.

“We’re doing simple builds to look back at how far we’ve come from the GPT-3 era, [we] Singh states: “The game could be a test agent inference that is safer than real life and more controllable for testing purposes.

Other games such as Pokémon Red, Street Fighter, and Pictionary have been used as experimental benchmarks for AI. This is well known for its tricky art of AI benchmarking.

Researchers often test AI models with standardized assessments, but many of these tests give AI the advantages of a home field. Because of the way they were trained, models are naturally talented in solving a particular narrow kind of problem, especially problem solving that requires memorization or basic extrapolation.

Simply put, it’s difficult to mean Openai’s GPT-4 can score in the 88th percentile of the LSAT, but it’s impossible to identify the number of RSs in the word “strawberry.” Anthropic’s Claude 3.7 Sonnet achieved 62.3% accuracy with standardized software engineering benchmarks, but has been worse playing Pokemon than most 5 year olds.

Image credit: Minecraft benchmark

The MC bench is technically a programming benchmark as models are asked to write code to create prompt builds such as “snowman” and “attractive tropical beach sheds.”

However, most MC bench users can easily assess whether the snowman looks better than digging into the code. This makes the project more broad and appealing.

Of course, whether these scores discuss a lot in the ways of AI usefulness. Singh claims they are a strong signal.

“Current leaderboards are very closely reflected in my own experience using these models, which is different from many pure text benchmarks,” Singh said. “perhaps [MC-Bench] It may help businesses to know if they are heading in the right direction. ”


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleThe Democratic Government’s Beshear Veto GOP Bill aims to dismantle DEI’s efforts at public universities
Next Article Claude can search the web: Is Anthropic’s AI a threat to Google’s search control?
user
  • Website

Related Posts

Government email alert system Govdelivery is used to send fraud messages

May 13, 2025

Insurtech bestow Lands $120 Million Series D Goldman Sachs, Smith Point Capital

May 13, 2025

Slate Auto exceeds 100,000 refundable bookings in 2 weeks

May 12, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Trump administration cuts another $450 million with Harvard grants

Florida students have been accused of remaining in jail for massive shootings on campus

Government email alert system Govdelivery is used to send fraud messages

Experience12 and MCM London Comic Con Partners for Returning the Popcultr Marketing Summit

Trending Posts

Albanian dominant socialists secure a majority in parliamentary votes | Election news

May 13, 2025

Real Madrid vs Mallorca: Laliga – Vinicius Jr., Start, Team News, Lineup | Football News

May 13, 2025

Sean “Diddy” Combs Trial: Important takeout from day 1, what are you expecting today? |Sexual Assault News

May 13, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

The confusion of AI startups surges to a $14 billion valuation amid $500 million pay raises.

DoubleUp: A new generation of Gamblefi

Robinhood acquires Wonderfi with $250 million in cash to accelerate Global Crypto expansion

Moonx: BYDFI’s On-Chain Trading Engine – CEX to DEX Ticket

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.