“Proof by blackmail”: AI confidently solves “impossible” math problems. But will that convince the world’s top mathematicians?

The 2025 Secret Conference brought together the world’s leading mathematicians to test OpenAI’s latest large-scale language model, o4-mini.

Experts at the conference were surprised at how the model’s responses sounded like real mathematicians when performing complex proofs.

you may like

“Unfortunately, AI is much better at sounding like it has the right answer than it is at actually coming up with the answer. Right or wrong, AI always looks convincing,” Tao said.

He cautioned against accepting “evidence” from AI. “One of the things we’ve learned from using AI is that if you give it a goal, it will cheat like crazy to achieve that goal,” Tao said.

It may seem almost abstract to ask whether you can really “prove” a highly technical mathematical conjecture if you cannot understand the proof, but the answer can have important implications. After all, if you can’t trust a proof, you can’t develop further mathematical tools and techniques from that foundation.

For example, one of the major open questions in computational mathematics, P vs. NP, essentially asks whether problems whose solutions are easy to see are also easy to find in the first place. If we can prove it, we could transform scheduling and routing, streamline supply chains, accelerate chip design, and even accelerate drug discovery. The flip side of this is that the security of most modern cryptographic systems can be compromised by verifiable evidence. The answers to these questions are not only difficult, but downright dangerous.

Non-mathematicians may be shocked to learn that mathematical proofs derived by humans have always been, to some extent, a social construct for convincing others in the field that the argument is correct. After all, mathematical proofs are often accepted as truth if other mathematicians analyze them and deem them correct. That is, widely accepted evidence does not guarantee that a statement is irrefutably true. Andrew Granville, a mathematician at the University of Montreal, suspects there are also problems with some of the better-known and more scrutinized human-made mathematical proofs.

There is some evidence for that claim. “There are some well-known papers that are wrong because of minor linguistic issues,” Granville told LiveScience.

Perhaps the most well-known example is Andrew Wiles’ proof of Fermat’s Last Theorem. This theorem states that there are integers for which one square is equal to another square (such as 32+42=52), but there are no integers for which the same is true for cubes, fourths, or other higher powers.

Yellowed book pages display various paragraphs of text written in Latin and other languages. — Fermat proposed what is now known as the “final” theorem in 1637. His 1670 book Arithmetic contained Fermat’s commentary and was published after his death. (Image credit: Wikimedia Commons)

Wiles famously worked on his research for seven years in almost complete isolation and published his evidence in a much-publicized lecture series at Cambridge in 1993. When Wiles ended his last lecture with the immortal line, “I think I’ll stop there,” the audience erupted into thunderous applause and a bottle of champagne was uncorked to celebrate his accomplishment. Newspapers around the world declared victory for mathematicians over a 350-year-old problem.

A man with curly brown hair and wire-frame glasses in a black sweater stands in front of a green chalkboard with equations written in white scrawl, and in front of him is a crowd of people sitting — Andrew Wiles explaining the proof of the Taniyama-Shimura conjecture in 1993. Although his initial proof contained errors, he eventually found a final solution that led to the proof of Fermat’s Last Theorem. (Image provided by Science Photo Library)

However, during the peer review process, reviewers discovered a serious flaw in Wiles’ proof. He spent another year working on the problem and finally solved it.

However, for a short period of time the world believed that the proof was solved, when in fact it was not.

mathematical verification system

To prevent this type of problem, where proofs are accepted when they are actually incorrect, there is a movement to strengthen them with what mathematicians call formal verification languages.

These computer programs, the best-known examples of which are called lean, require mathematicians to convert proofs into a very precise form. The computer then executes every step and applies rigorous mathematical logic to ensure that the argument is 100% correct. When a computer encounters a step in a proof that it doesn’t like, it flags it and doesn’t let go. This coded formalism leaves no room for the linguistic misunderstandings that plagued Granville’s earlier proofs.

Kevin Buzzard, a mathematician at Imperial College London, is one of the leading proponents of formal verification. “I started this business because I was worried that human proofs were incomplete and wrong, and that we humans weren’t doing a good job of documenting our arguments,” Buzzard told Live Science.

In addition to validating existing human proofs, mathematicians say AI working in conjunction with programs like Lean could be a game-changer.

Regarding the problem of AI sometimes coming up with proofs that seem convincing but end up being wrong, Tao said, “If you force the output of an AI to produce something in a formally verified language, that in principle solves most problems.”

“There are papers in mathematics where no one understands the whole paper. You know, there are papers with 20 authors, and each author understands his part. No one understands the whole. And that’s okay. That’s how it works.”
Kevin Buzzard, Mathematician, Imperial College London

Buzzard agreed. “You want to think that instead of just writing the output of the model to the system, you might be able to convert it to Lean and be able to run it in Lean,” he said. He envisioned a back and forth between Lean and the AI, with Lean pointing out a mistake and the AI trying to fix it.

If AI models can be run in a formal verification language, AI could tackle some of mathematics’ toughest problems by finding connections that are beyond the reach of human creativity, experts told Live Science.

“AI is very good at finding connections between areas of mathematics that humans wouldn’t necessarily think would connect,” Mark Luckenby, a mathematician at the University of Oxford, told Live Science.

Evidence that no one understands?

Taking the idea of formally verified AI proofs to its logical extreme, there is a realistic future in which AI develops “objectively correct” proofs that are too complex for humans to understand.

This is a difficult problem for mathematicians in a completely different way. It raises fundamental questions about the purpose of mathematics as a discipline. After all, what’s the point in proving something that no one understands? And if so, does it mean that the state of human knowledge has further improved?

Of course, the concept that proofs are too long and complex for anyone on the planet to understand is not new to mathematics, Buzzard said.

“There are some papers in mathematics where no one understands the whole paper. You know, there are papers with 20 authors, and each author understands their part,” Buzzard told Live Science. “Nobody knows everything, and that’s okay. That’s how it works.”

Buzzard also pointed out that proofs that rely on computers to fill in gaps are not new. “We’ve been doing computer-assisted proofs for decades,” Buzzard said. For example, the four-color theorem states that if you are dividing a map into countries or regions, you do not need more than four different colors to shade the map so that adjacent regions are never the same color.

A map of the continental United States with each state shown in one of four colors: orange, pink, green, or yellow. — The four-color theorem states that any map can be colored with only four colors so that no two colors touch each other. By 2005, it was formally proven, primarily using computers. (Image provided by Science Photo Library)

Almost 50 years ago, in 1976, mathematicians divided the problem into thousands of small testable cases and wrote a computer program to test each one. Mathematicians could rest assured that their proofs were correct as long as they were sure that the code they had written was correct. The first computer-aided proof of the four-color theorem was published in 1977. Confidence in the proof built up gradually over the years and strengthened to near universal acceptance when a simpler but still computer-aided proof was created in 1997 and a formally verified, machine-checked proof was published in 2005.

“The four-color theorem was proven on a computer,” Buzzard pointed out. “People were very upset about it. But now it’s accepted. It’s in the textbooks.”

unknown territory

But these examples of computer-assisted proofs and mathematical teamwork feel fundamentally different from AI proposing, adapting, and verifying proofs all on its own. This is probably proof that no human or human team wants to understand.

Whether mathematicians welcome it or not, AI is already reshaping the very nature of proofs. For centuries, the act of generating and verifying proofs has been a human endeavor, an argument created to persuade other human mathematicians. We are approaching a situation where machines can generate sensitive logic, verified by formal systems, that even the best mathematicians cannot follow.

In that future scenario, if that happens, AI would perform every step, from proposal to testing to verifying evidence, “then you’ve won,” Luckenby said. “You proved something.”

However, this approach raises deep philosophical questions. If proofs become comprehensible only to computers, will mathematics remain a human task or will it evolve into something entirely different? And one might wonder what all that means, Luckenby points out.

Source link

What's Hot

China’s brain-computer interface industry moves forward

6 days left until disruption rate locks in at lowest level in 2026

A 9,000 pound monster that I don’t want to give back.

“Proof by blackmail”: AI confidently solves “impossible” math problems. But will that convince the world’s top mathematicians?

Why is there a desert next to the sea?

A 2,000-year-old skull reveals ancient Vietnamese people permanently blackened their teeth – a stylish habit that continues today

Boeing Starliner mission failure was the highest-ranking accident that put stranded astronauts at risk, says report

China’s brain-computer interface industry moves forward

6 days left until disruption rate locks in at lowest level in 2026

A 9,000 pound monster that I don’t want to give back.

Why is there a desert next to the sea?

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

What's Hot

“Proof by blackmail”: AI confidently solves “impossible” math problems. But will that convince the world’s top mathematicians?

Evidence is a social construct

mathematical verification system

Evidence that no one understands?

unknown territory

Related Posts