Scientists have developed a new type of artificial intelligence (AI) model that can infer in a different way than most large-scale language models (LLMs) like CHATGPT, bringing much better performance with key benchmarks.
A new inference AI called the Hierarchical Inference Model (HRM) is inspired by the hierarchical and multi-timescale processing of the human brain. This is a way in which different brain regions integrate information over different periods (from milliseconds to several minutes).
Scientists at Sapient, a Singaporean AI company, say the inference model could achieve better performance and function more efficiently. This is thanks to models that require fewer parameters and training examples.
You might like it
The HRM model has 27 million parameters while using 1,000 training samples, scientists said in a study uploaded to the preprint ARXIV database on June 26 (not yet peer reviewed). In comparison, most advanced LLMs have billions or even trillions of parameters. Although the exact figures have not been published, some estimates suggest that the newly released GPT-5 has parameters of 3 trillion to 5 trillion parameters.
A new way of thinking about AI
The study found that when researchers tested HRM with ARC-AGI benchmarks, the system achieved impressive results in an infamous, challenging study aimed at testing how close the model was to achieving artificial general information (AGI).
HRM scored 40.3% on ARC-AGI-1, but 34.5% on Openai’s O3-Mini-High, 21.2% on human Claude 3.7, and 15.8% on Deepseek R1. In the more stringent ARC-AGI-2 tests, HRM scored 5% against 3% for O3-Mini-High, 1.3% for Deepseek R1 and 0.9% for Claude 3.7.
Most advanced LLMs use Chain of Theatu (COT) inference. In this inference, complex problems are categorized into multiple, much simpler intermediate steps, expressed in natural language. Emulates human thought processes by breaking down elaborate problems into digestible chunks.
Related: AI is in a “unprecedented system.” We should stop it – and we should stop it before it destroys us –
However, Sapient scientists argue that COT has important drawbacks: “fragile task decomposition, extensive data requirements, and high latency.”
Instead, the HRM performs sequential inference tasks over a single forward pass, through two modules, without explicit supervision of intermediate steps. One high-level module is responsible for the slowly and abstract planning, while the low-level module handles quick and detailed calculations. This is similar to the way the human brain processes information in different regions.
It works by applying iterative refinement, a computing technique that improves the accuracy of the solution by repeatedly improving the initial approximation over several short bursts of “thinking.” Each burst considers whether the thought process should continue or be submitted as a “final” answer to the first prompt.
HRM achieves near perfect performance on challenging tasks like complex Sudoku puzzles that traditional LLM could not achieve, and is excellent at optimal route research in the maze.
Although this paper has not been peer-reviewed, the organizers of the ARC-AGI benchmark tried to replicate the results themselves after research scientists opened sourced the model on GitHub.
They replicated the numbers, but mentioned in a blog post, they made some surprising discoveries, including minimizing the performance impact of hierarchical architectures – instead there was a sophisticated refinement process that promotes significant performance improvements during training.
Source link