Monday, December 23, 2024

Google DeepMind just made another breakthrough in AI models that can do math

Must read

Google DeepMind says it has achieved a breakthrough in building an AI system that can handle complex mathematical problems.

The research division, which is part of Alphabet-owned Google, announced Thursday that it has created a software system that combines multiple AI models and can score well-enough on the International Mathematical Olympiad (IMO), a global test of high school students’ mathematical talents, to be in the top quartile of contestants taking the test. This would be good enough to obtain a silver medal in the competition.

A boastworthy milestone in the machine-versus-mathlete contest, the feat also opens new possibilities for combining different approaches to AI in order to create more capable hybrid AI systems—something that Google said could eventually make its way into commercial products like its line-up of Gemini AI tools.

The news was an advancement on a system that the AI research lab had unveiled in January, called AlphaGeometry, that could solve geometry problems from the IMO about as well as top high school students. The new system—which combines a new model called AlphaSolver and an updated and improved AlphaGeometry 2—can tackle all kinds of different mathematical problems and develop sophisticated answers.

The new system managed to answer the most difficult IMO question, which only five of the 609 human contestants in last week’s competition managed to solve. That said, the new system is not perfect—on two of the six problems in the IMO, the new system did not manage to find a solution, and on one problem it took the system three days to reach the correct answer. Human competitors have to solve three questions in four-and-a-half hours, so can average no more than 90 minutes per question.

Google DeepMind researchers said the new system is a step towards more powerful AI models that will be able to plan and reason about complex tasks, although they cautioned that the method would work best in situations where there was a clear way to determine if an output was a valid. This is the case, for example, in software coding where the code will only compile and run if it is valid. David Silver, one of the Google DeepMind researchers who worked on the new system, said it might also work in areas where humans could provide unambiguous feedback about whether the solution the AI produced was a good one.

Google DeepMind said it would incorporate insights from the new system into future versions of its Gemini AI models, although they did not say exactly how this would be done or how soon Gemini might see these upgraded mathematical abilities.

Silver acknowledged that in many real world situations the validity of an answer is highly subjective, or the soundness of a solution can only be determined after a long time period. He said this would make it harder to apply the methods Google DeepMind used for the IMO problems to successfully take on these kind of real world problems.

AlphaZero to the rescue

Unlike other well-known AI models that consist of a single large neural network—a kind of AI software loosely based on the human brain—the AlphaSolver system involves multiple neural networks, each performing different functions.

A large language model (LLM)—in this case Google’s Gemini model—is used as one part of the process. But the LLM does not itself do the mathematical reasoning. LLMs which underpin popular AI chatbots—such as Gemini, OpenAI’s ChatGPT, Anthropic’s Claude, and Meta’s AI chatbot—have struggled with solving math problems unless given access to outside tools, such as calculators or specialized math software.

Instead, the LLM is fine-tuned to translate text-based mathematical problems into a formal mathematical language. It then passes the problem to a different AI model, Google DeepMind’s AlphaZero, which was developed in 2017 and originally used to learn to play the strategy board games chess, go, and shogi at superhuman levels. But it turns out that AlphaZero can be used to puzzle out all kinds of problems in systems with clear rules and an easy way of keeping score.

In this case, the AlphaZero component is trained to suggest proof steps to the problem in Lean, a mathematical programming language. If the proof step is valid, it will compile correctly in Lean. If it isn’t, it won’t. This provides a reward signal—much like points in a video game—to AlphaZero. In this way, the AlphaZero component of AlphaSolver, learns by trial and error to take steps that are more likely to result in valid solutions. AlphaSolver was trained on about one million of examples of IMO problems in the weeks leading up to the competition, Google DeepMind said, and it continued to improve while working on the IMO contest problems.

In cases where the problem involved geometry, the problem was given instead to AlphaGeometry 2. AlphaGeometry 2 is also a hybrid system, combing an LLM component with a component that uses symbolic reasoning. The new AlphaGeometry could solve 83% of IMO geometry problems compared to just 53% for its predecessor. In one case, AlphaGeomerty was able to solve a highly complex geometry problem in just 19 seconds, a feat more akin to a flash of inspiration than a brute force approach based on endless trial and error. In another case, the proof AlphaGeometry offered initially confused some mathematicians who examined it, but they determined it was actually an elegant and highly-unusual way of solving the problem.

Impact on human mathematicians

Pushmeet Kohli, who heads Google DeepMind’s AI for science division, said he saw AlphaSolver and AlphaGeometry 2 primarily as tools for helping mathematicians in their work. Silver said he did not see these new mathematical AIs challenging the relevancy of academic mathematicians.

But Timothy Gowers, who is a director of research in mathematics at the University of Cambridge and a past winner of the Fields Medal—a prize that is awarded only once every four years to two to four mathematicians under the age of 40 who have contributed the most to the field—reviewed the proofs AlphaSolver and AlphaGeometry 2 produced and said he came away impressed. “I could recognize familiar-looking arguments that had come out of the system,” he said.

He also said that some of the problems required him, as a human mathematician, “to dig quite deep” and come up with what he called “a sort of magic key” that suddenly turns a problem that looks unsolvable into one that is imminently solvable. Gowers said he was surprised that the system had discovered a few of these magic keys because his intuition is that they would be difficult to stumble upon by naïve trial-and-error without any underlying understanding of the mathematical principles involved. But he said he reserved judgment as to whether this meant AlphaSolver had actually developed something akin to mathematical intuition. He said more research would be needed to understand more about exactly how the system managed to puzzle out answers to the IMO problems.

Gowers noted that IMO problems were much simpler than what research mathematicians work on. But, compared to Kohli and Silver, Gowers was far less sanguine about what future would hold if AI models kept improving at the current clip. “I actually think that when computers become really good at finding extremely hard proofs, that’s more or less game over for mathematical research,” he said. “I’m not trying to suggest that we’re all that close to that at the moment, but I just, I’m thinking a long way ahead, but how long ahead that really is, is very hard to say.”

Recommended Newsletter:

CEO Daily provides key context for the news leaders need to know from across the world of business. Every weekday morning, more than 125,000 readers trust CEO Daily for insights about–and from inside–the C-suite. Subscribe Now.

Latest article