Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

Honor launches new slim foldable Magic V6 with 6,600 mAh battery

SaaS inflow, SaaS outflow: Here’s what drives SaaSpocalypse

History of Science: Discovery of Carbon-14 Opens Window on Past Civilizations — February 27, 1940

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » Passing this new AI exam (which its creators say is the world’s most difficult) could show the first signs of AGI
Science

Passing this new AI exam (which its creators say is the world’s most difficult) could show the first signs of AGI

userBy userFebruary 27, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Researchers at the Center for AI Safety and Scale AI have published “Humanity’s Last Test.” It’s a test designed to measure how close today’s most powerful artificial intelligence (AI) models are to approaching or exceeding human-level knowledge across multiple domains.

The test began in January 2025, but scientists first outlined the framework and the thinking behind its design in a new study published in the journal Nature on January 28. It includes a corpus of 2,500 questions across more than 100 subjects, with input from more than 1,000 subject matter experts from 500 institutions in 50 countries.

The exam consists of multiple-choice and short-answer questions, and each question has a known solution that is “clear and easily verifiable, but not readily answered by an Internet search.”

you may like

At launch, researchers tested OpenAI’s GPT-4o and o1 models, Google’s Gemini 1.5 Pro, Anthropic’s Claude 3.5 Sonnet, and DeepSeek R1. OpenAI’s o1 system took the top spot with a score of just 8.3%.

Despite this poor performance, the researchers wrote at the time, “Given the rapid pace of AI development, the model’s HLE accuracy could exceed 50% by the end of 2025.”

As of February 12, 2026, the highest score ever achieved is 48.4%, recorded by Google’s Gemini 3 Deep Think. Human experts, on the other hand, score around 90% in their respective fields.

Testing the world’s smartest machines

The last test of humanity was intentionally designed to be extremely difficult for the AI ​​models. During the early stages of development, researchers solicited submissions globally from subject matter experts across numerous disciplines.

Get the world’s most fascinating discoveries delivered straight to your inbox.

The researchers applied strict submission criteria that required questions to be precise, unambiguous, solvable, and non-searchable. They didn’t want the model to cheat by doing a simple web search, or the question to already appear online, increasing the likelihood that a particular model would have the answer in its training dataset.

Each question submitted was fed to an AI model. The team automatically rejected questions that the model could answer correctly.

More than 70,000 submissions were attempted, resulting in approximately 13,000 questions that stumped LLMs. These were then reviewed by a team of subject matter experts, approved by the research team, and presented to the scientific community for open feedback.

you may like

Ultimately, the researchers narrowed the total submitted questions down to 2,500 questions, which generally fall within the scope of a doctoral-level exam.

An example of a trivia question on an exam is “In Greek mythology, who is Jason’s maternal great-grandfather?”

An example physics problem, on the other hand, asks about the relationships between various forces in motion in a scenario where a block rests on a horizontal rail (so it can slide without friction) and is also attached to a stiff, massless rod of unknown length.

The breadth of questions and subject matter covered in Humanity’s Last Exam sets it apart from similar benchmarking tools, say its creators.

Common tests, such as the Massive Multitask Language Understanding (MMLU) dataset created with the participation of Center for AI Safety founder Dan Hendrycks, test only a small subset of expert-level domain knowledge, primarily focused on coding and math.

Even cutting-edge benchmarks like Francois Chollet’s ARC-AGI suite struggle to outperform the memory and searchability issues that the creators of Humanity’s Last Exam suggest their new test will address. For example, Gemini’s Deep Think achieved 84.6% on the ARC-AGI-2 benchmark just one week after failing to reach 50% on the HLE test.

The ultimate prize is general intelligence

While “Humanity’s Last Test” likely represents the greatest attempt in the history of the AI ​​world to measure the wide range of capabilities of modern AI models compared to human experts, the study’s authors make clear that achieving a high score on the HLE in no way signals the arrival of artificial general intelligence (AGI).

“HLE’s high accuracy demonstrates expert-level performance with respect to closed-ended testable questions and cutting-edge scientific knowledge, but by itself does not imply autonomous research capabilities or artificial general intelligence,” the scientists wrote in the study.

“Performing well on the HLE is a necessary but not sufficient criterion for a machine to reach true intelligence,” Manuel Schottdorf, a neuroscientist in the University of Delaware’s Department of Psychological and Brain Sciences, said in a recent statement. Schottdorff is one of many experts whose questions have been accepted into the HLE corpus.

“Machines need to be smart enough to answer these questions, but that alone doesn’t mean they’re truly intelligent.”


Source link

#Biotechnology #ClimateScience #Health #Science #ScientificAdvances #ScientificResearch
Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleMusk criticized OpenAI in his deposition, saying, “No one committed suicide because of Grok.”
Next Article Department of Defense to designate humans as supply chain risk
user
  • Website

Related Posts

History of Science: Discovery of Carbon-14 Opens Window on Past Civilizations — February 27, 1940

March 1, 2026

This week’s science news: Spider webs on Mars, tigers returned to Kazakhstan, 2,000-year-old skull with permanently blackened teeth

February 28, 2026

The sun just celebrated its first “clean day” in four years, but we still don’t know for sure.

February 27, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Honor launches new slim foldable Magic V6 with 6,600 mAh battery

SaaS inflow, SaaS outflow: Here’s what drives SaaSpocalypse

History of Science: Discovery of Carbon-14 Opens Window on Past Civilizations — February 27, 1940

A trap that Anthropic has built for itself.

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2026 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.