Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
What's Hot

AB will be released at Binance -Tech Startups

After data is wiped out, Kiranapro co-founders cannot rule out external hacks

Top 10 Startups and Tech Funding News for the Weekly Ends June 6, 2025

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
Fyself News
Home » New, challenging AGI tests cut down most AI models
Startups

New, challenging AGI tests cut down most AI models

userBy userMarch 25, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The ARC Awards Foundation, a nonprofit co-founded by AI-famous researcher François Charette, announced in a blog post Monday that it has created a new, challenging test to measure the general intelligence of key AI models.

So far, most models are confused in a new test called the ARC-AGI-2.

According to the ARC Awards leaderboard, there is an “inference” AI model with Openai’s O1-Pro and Deepseek’s R1 scores of 1% to 1.3% on the ARC-AGI-2. Powerful irrational models such as GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 flash scores about 1%.

The ARC-AGI test consists of puzzle-like problems where AI identifies visual patterns from a collection of squares of different colors and generates the correct “answer” grid. This issue is designed to force AI to adapt to new problems that have never been seen before.

Over 400 people have taken ARC-AGI-2 to establish a human baseline. On average, the “panel” of these people correctly won 60% of the test questions. This is much better than the model’s score.

Sample questions from ARC-AGI-2 (credit: ARC Award).

In X’s post, Chollet argued that ARC-AGI-2 is a better measure of the actual intelligence of the AI ​​model than the first iteration of the test, ARC-AGI-1. The ARC Award Foundation test aims to assess whether AI systems can efficiently acquire new skills other than trained data.

Unlike the ARC-AGI-1, new testing prevents AI models from relying on “brute force” (a wide range of computing power) to find solutions. Chollet previously admitted that this is a major flaw in ARC-AGI-1.

To address the flaws in the initial test, ARC-AGI-2 introduces a new metric: efficiency. And instead of relying on memory, the model must interpret the patterns on the spot.

“Intelligence is not defined by the ability to solve problems or achieve high scores,” wrote Greg Kamradt, co-founder of the ARC Awards Foundation in a blog post. “The efficiency at which these features are acquired and deployed is a critical, defined component. [the] A skill to solve tasks? “How is it efficient and cost?”

The ARC-AGI-1 has been undefeated for about five years since December 2024. This has released Openai’s advanced inference model, O3, which surpasses all other AI models, matching human performance in ratings. However, as mentioned at the time, the O3 performance improvements on the ARC-AGI-1 came with a large price tag.

The O3 (low) version of Openai’s O3 model first reached new heights with the ARC-AGI-1, earning 75.7% in testing and just 4% with the ARC-AGI-2 using $200 worth of computing power per task.

Comparison of performance of frontier AI models of ARC-AGI-1 and ARC-AGI-2 (credit: ARC Award).

The arrival of the ARC-AGI-2 is as many people in the tech industry are seeking new unsaturated benchmarks to measure AI progress. Face co-founder Thomas Wolf recently told TechCrunch that the AI ​​industry doesn’t have enough tests to measure key properties of so-called artificial general information, including creativity.

In addition to the new benchmarks, the ARC Awards Foundation announced a new ARC Awards 2025 contest, challenging developers to reach 85% accuracy in the ARC-AGI-2 test, spending only $0.42 per task.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleCornell students suing Trump administration are asked to surrender to immigration authorities
Next Article Openai says it’s better for AI voice assistants to chat
user
  • Website

Related Posts

After data is wiped out, Kiranapro co-founders cannot rule out external hacks

June 7, 2025

Why investing in a growing AI startup is risky and more complicated

June 6, 2025

Humanity appoints national security experts to governing trusts

June 6, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

AB will be released at Binance -Tech Startups

After data is wiped out, Kiranapro co-founders cannot rule out external hacks

Top 10 Startups and Tech Funding News for the Weekly Ends June 6, 2025

Why investing in a growing AI startup is risky and more complicated

Trending Posts

Sana Yousaf, who was the Pakistani Tiktok star shot by gunmen? |Crime News

June 4, 2025

Trump says it’s difficult to make a deal with China’s xi’ amid trade disputes | Donald Trump News

June 4, 2025

Iraq’s Jewish Community Saves Forgotten Shrine Religious News

June 4, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

AB will be released at Binance -Tech Startups

Top 10 Startups and Tech Funding News for the Weekly Ends June 6, 2025

Order openai to keep all chatgpt logs including deleted temporary chats, API requests

Omada Health is now available: Virtual Care Startup joins IPO Wave, paying $150 million, $1.1 billion valuation of NASDAQ debut

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.