The AI system developed by Google Deepmind, the leading AI research lab at Google, appears to outperform the average gold medalist when solving geometric problems in international mathematical competition.
The system called Alphageometry2 is an improved version of Alphageometry, a system of DeepMind released last January. In a newly published study, the deep researchers behind Alphagemetry2 solve 84% of all geometric problems in the last 25 years at the International Mathematics Olympiad (IMO), a mathematics contest for high school students. It claims that it can be done.
Why is Deepmind interested in high school-level math competition? Well, the lab thinks that the key to more capable AI might lie about discovering new ways to solve challenging geometric problems, especially Euclidean geometry problems. It’s there.
To prove the mathematical theorem or to logically explain why the theorem (e.g. Pythagoras’ theorem) is true, you need both the ability to choose from a range of steps that may have been directed towards a solution. is. These problem-solving skills may turn out to be useful components of future generic AI models if they have the rights to DeepMind.
In fact, this summer, Deepmind demonstrated a system that combines Alphageometry2 with Alphaproof, an AI model of formal mathematical inference, to solve four of the six problems in IMO in 2024. In addition to geometry problems, such approaches can be extended to other fields of mathematics and science to aid in the calculation of complex engineering.
Alphageometry2 has several core elements, including the language model for the AI model of Google’s Gemini family and the “symbolic engine.” The Gemini model helps symbolic engines to infer solutions to problems using mathematical rules, and arrives at viable proofs of a particular geometry theorem.
The geometry problem in Olympiad is based on diagrams such as points, lines, circles, etc. that require “configuration” to be added before solving it. The Gemini model in Alphageometry2 predicts which components may be useful for adding them to the diagram.
Essentially, the Gemini model of Alphageometry2 suggests steps and structure in formal mathematical language to the engine. The search algorithm allows Alphagemetry2 to perform multiple searches of solutions in parallel and store useful findings in a general knowledge base.
Alphageometry2 considers the problem to be “solved” when it arrives at evidence that combines the Gemini model proposal with known principles of symbolic engines.
The complexity of translating proofs into formats that AI can understand leads to a lack of available geometry training data. So DeepMind has created its own synthetic data to train the language model for Alphageometry2, generating over 300 million theorems and proofs of various complexity.
The Deepmind team has selected 45 geometric problems from IMO competitions over the past 25 years (2000-2024) that include linear equations and equations that require moving geometric objects around the plane. I then “translated” these into a big set of 50 issues. (For technical reasons, I had to split some issues into two.)
According to the paper, Alphageometry2 solved 42 out of 50 problems, with an average gold medalist score of 40.9.
Certainly there are limitations. Due to technical quirks, Alphageometry2 prevents solving problems of varying numbers of points, nonlinear equations, and inequality. And while Alphageometry2 is technically not the first AI system to reach gold medal-level performance in geometry, it is the first system to achieve that with this sized problem set.
Alphageometry2 has also gotten worse with another set of more difficult IMO problems. For additional challenges, the DeepMind team selected questions (29 total) that had been nominated for the IMO exam by mathematics experts, but it had not yet appeared in the competition. Alphageometry2 was able to solve only these 20.
Still, the findings say whether AI systems need to be built on symbol manipulation, i.e., whether to manipulate symbols representing knowledge using rules, or, on the surface, neural networks like the brain. may encourage discussion.
Alphageometry2 uses a hybrid approach. The Gemini model has a neural network architecture, and its symbolic engine is rule-based.
Advocates of neural network technology argue that from speech recognition to image generation, intelligent actions are nothing more than a huge amount of data and computing. Opposed to symbolic systems that solve tasks by defining a set of symbolic manipulation rules specialized for a particular job, such as editing lines in word processor software, neural networks solve tasks through statistical approximations and use examples. I’m trying to learn.
Neural networks are the foundation of powerful AI systems, such as OpenAI’s O1 “inference” model. But claiming to support iconic AI, they are not the end of everything. Iconic AI could be better positioned to efficiently encode world knowledge, pass through complex scenarios and “explain” how they reached the answer, and these Supporters insist.
“It’s amazing to see the contrast between these types of benchmarks and the language models that include recent ones with ‘inference’ during this time,” says Vince Conitzer of Carnegie Mellon. A computer science professor at a university specializing in AI told TechCrunch. “I don’t think it’s all smoke and mirror, but it shows that we still don’t really know what behavior is expected for the next system. These systems could be very impactful. Because they are high, you need to understand them urgently, and the risk is much better.”
Alphageometry2 shows that perhaps two approaches, combining symbolic manipulation and neural networks, are promising paths in searching for generalizable AI. In fact, according to deep paper, O1, which also has a Neural Network architecture, failed to solve the IMO problem that Alphageometry2 could answer.
This may not be the case forever. In the paper, the Deepmind team stated that they found preliminary evidence that the language model of Alphageometry2 can generate partial solutions to the problem without the help of symbolic engines.
“[The] The results support the idea that large-scale language models can be self-sufficient without relying on external tools [like symbolic engines],” the Deep Mind Team wrote in their paper. [model] Speeds improve, hallucinations are completely resolved, and the tools remain essential for mathematical applications. ”
Source link