A secret mathematics conclave was convened over the weekend in mid-May. Thirty of the world’s most famous mathematicians traveled to Berkeley, California. The UK members faced off in a “inference” chatbot, tasked with solving problems devised to test mathematical tempers, with group members facing off. After throwing professor-level questions to the bot for two days, the researchers were unsure of discovering that they could answer some of the world’s most difficult and solveable problems. “I have colleagues who literally said that these models are approaching mathematics geniuses,” says University of Virginia mathematician Ken Ono and the conference leader and judge.
The chatbot in question is equipped with the O4-Mini, a so-called large language model (LLM) of inference. It was trained by Openai to allow you to make very complicated deductions. Google’s comparable Gemini 2.5 Flash has similar capabilities. Like LLMS, which drove previous versions of ChatGPT, O4-Mini learns to predict the next word in sequence: However, compared to previous LLMs, the O4-MINI and its equivalent are lighter, more agile models, trained on a specialized dataset with powerful human enhancements. This approach leads to chatbots that can delve far deeper into complex mathematical problems than traditional LLM.
To track O4-MINI’s progress, Openai previously appointed Epoch AI, a nonprofit that benchmarks LLMS, to come up with 300 mathematics questions whose solutions have not yet been published. Even traditional LLMs can answer many complex mathematical questions correctly. However, when Epoch AI asked some such models, these questions differed from what they were trained, with the most successful being able to solve less than 2%. However, the O4-Mini proves to be very different.
You might like it
Epoch AI hired Elliot Glazer, who recently completed his PhD in Math, to participate in a new collaboration with the benchmark in September 2024, known as Front Elmas. The project covered the first three layers, covering the first three layers, covering undergraduate, graduate and research-level assignments. By April 2025, Glazer discovered that O4-Mini could solve about 20% of questions. He then moved to the fourth tier. This is a series of questions that will be challenging for scholars and mathematicians as well. Only a small number of people in the world can develop such questions and not to mention answering them. Participating mathematicians were required to sign a private agreement requiring them to communicate via messaging app signals only. Other forms of contacts, such as traditional email, can be scanned and inadvertently trained by LLM, thereby contaminating the dataset.
Each problem that O4-Mini couldn’t solve will win a mathematician who came up with a reward of $7,500. The group made slow and steady progress in finding questions. However, Glaser wanted to speed things up, so Epoch AI held in-person meetings on Saturday, May 17th and Sunday, May 18th. So participants completed the final batch of challenge questions. The 30 participants were divided into groups of six. For two days, scholars competed with themselves to devise problems that could solve but could stumble AI reasoning bots.
By the end of that Saturday night, Ono was unhappy with the bot. The bot’s unexpected mathematical capabilities were hampering the group’s progress. “I came up with a problem that experts in my field perceived as an unsolved question in the theory of numbers. He asked O4-Mini to solve the question. Over the next 10 minutes, Ono saw in a surprising silence that the bot spread the solution in real time and demonstrated its inference process along the way. The bot found and mastered the relevant literature in the field for the first two minutes. Then, on the screen, he wrote that he wanted to first solve a simpler “toy” version of the question to learn. A few minutes later, he wrote that he was finally prepared to solve the more difficult problems. Five minutes later, O4-Mini presented a correct but cheeky solution. “I was starting to get really cheeky,” says Ono, who is also a freelance math consultant at Epoch AI. “And finally, I say, “The mystery number was calculated by me, so no quotation is required!” ”
Related: AI Benchmark Platform claims that it helps top companies regrig the performance of their models
In defeat, Ono jumped at the traffic light early on Sunday morning to warn the remaining participants. “I wasn’t prepared to compete with LLMs like this,” he says.
The group ultimately managed to find ten questions that hinder the bot, but researchers were surprised at how far AI has progressed in the year. Ono compared it to working with a “strong collaborator.” Yang Hui He is a mathematician at the Institute of Mathematics and Sciences, and was an early pioneer in using AI in mathematics, saying, “This is something very good graduate students do.
The bots also took just minutes to complete what would take such a human expert weeks or months, much faster than professional mathematicians.
Sparring with the O4-mini was thrilling, but the progress was also amazing. Ono and he express concern that the results of the O4-Mini may be too trusted. “There’s evidence from induction, evidence from inconsistencies, evidence from intimidation,” he says. “If you say something with enough authority, people just get scared. I think O4-MINI acquired the evidence through threats. It’s all said with confidence.”
By the end of the meeting, the group began to think about what the future would look like for mathematicians. The debate has been transformed into an inevitable “Tier 5.” The question is that even the best mathematicians couldn’t solve it. Once AI reaches that level, the role of a mathematician will undergo a sharp change. For example, mathematicians simply move towards raising questions and interacting with reasoning bots, helping professors discover new mathematical truths just like graduate students. Therefore, Ono predicts that fostering creativity in higher education will be the key to maintaining mathematics for future generations.
“I have told my colleagues that it is a serious mistake to tell them that generalized artificial intelligence never comes. [that] It’s just a computer,” says Ono.
This article was first published in Scientific American. ©ScientificAmerican.com. Unauthorized reproduction is prohibited. Follow Tiktok and Instagram, X and Facebook.
Source link