Passing this new AI exam (which its creators say is the world’s most difficult) could show the first signs of AGI

Researchers at the Center for AI Safety and Scale AI have published “Humanity’s Last Test.” It’s a test designed to measure how close today’s most powerful artificial intelligence (AI) models are to approaching or exceeding human-level knowledge across multiple domains.

The test began in January 2025, but scientists first outlined the framework and the thinking behind its design in a new study published in the journal Nature on January 28. It includes a corpus of 2,500 questions across more than 100 subjects, with input from more than 1,000 subject matter experts from 500 institutions in 50 countries.

Testing the world’s smartest machines

The last test of humanity was intentionally designed to be extremely difficult for the AI models. During the early stages of development, researchers solicited submissions globally from subject matter experts across numerous disciplines.

The researchers applied strict submission criteria that required questions to be precise, unambiguous, solvable, and non-searchable. They didn’t want the model to cheat by doing a simple web search, or the question to already appear online, increasing the likelihood that a particular model would have the answer in its training dataset.

Each question submitted was fed to an AI model. The team automatically rejected questions that the model could answer correctly.

More than 70,000 submissions were attempted, resulting in approximately 13,000 questions that stumped LLMs. These were then reviewed by a team of subject matter experts, approved by the research team, and presented to the scientific community for open feedback.

The ultimate prize is general intelligence

While “Humanity’s Last Test” likely represents the greatest attempt in the history of the AI world to measure the wide range of capabilities of modern AI models compared to human experts, the study’s authors make clear that achieving a high score on the HLE in no way signals the arrival of artificial general intelligence (AGI).

“HLE’s high accuracy demonstrates expert-level performance with respect to closed-ended testable questions and cutting-edge scientific knowledge, but by itself does not imply autonomous research capabilities or artificial general intelligence,” the scientists wrote in the study.

“Performing well on the HLE is a necessary but not sufficient criterion for a machine to reach true intelligence,” Manuel Schottdorf, a neuroscientist in the University of Delaware’s Department of Psychological and Brain Sciences, said in a recent statement. Schottdorff is one of many experts whose questions have been accepted into the HLE corpus.

“Machines need to be smart enough to answer these questions, but that alone doesn’t mean they’re truly intelligent.”

Source link

What's Hot

Honor launches new slim foldable Magic V6 with 6,600 mAh battery

SaaS inflow, SaaS outflow: Here’s what drives SaaSpocalypse

History of Science: Discovery of Carbon-14 Opens Window on Past Civilizations — February 27, 1940

Passing this new AI exam (which its creators say is the world’s most difficult) could show the first signs of AGI

History of Science: Discovery of Carbon-14 Opens Window on Past Civilizations — February 27, 1940

This week’s science news: Spider webs on Mars, tigers returned to Kazakhstan, 2,000-year-old skull with permanently blackened teeth

The sun just celebrated its first “clean day” in four years, but we still don’t know for sure.

Honor launches new slim foldable Magic V6 with 6,600 mAh battery

SaaS inflow, SaaS outflow: Here’s what drives SaaSpocalypse

History of Science: Discovery of Carbon-14 Opens Window on Past Civilizations — February 27, 1940

A trap that Anthropic has built for itself.

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

Passing this new AI exam (which its creators say is the world’s most difficult) could show the first signs of AGI

Testing the world’s smartest machines

The ultimate prize is general intelligence

Related Posts