This is not investment advice. The author has no position in any of the stocks mentioned. Wccftech.com has a disclosure and ethics policy.
Amidst the ongoing race in Silicon Valley to develop artificial intelligence models with human like responses, Google has shared the latest data for its Gemini 1.5 Pro model. Like ChatGPT, Gemini also uses artificial intelligence to solve problems, and this time around, Google is touting the mathematical capabilities of its product. A.I. models have been in the news for their ability to solve college level tasks as a measure of their ‘intelligence’, and Google’s research paper shows the model improving its mathematics score and solving problems part of the international Mathematical Olympiad successfully.
Google’s Specialized Gemini Model For Math Problems Breezes Through Complex Problems
The details for Gemini’s latest math scores were shared in the model’s latest technical report last week. This rather hefty document shows that Google had intended to emulate the training that mathematicians receive to solve complex problems, and to achieve this, the firm trained a specialized variant of its Gemini A.I. model.
Gemini 1.5 Pro’s math variant seems to have been tested with several benchmarks. According to its document, Google relied on a variety of benchmarks to evaluate the output of its latest A.I. math model. These include the MATH benchmarks, American Invitational Mathematics Examination (AIME) and Google’s in house HiddenMath benchmarks which also introduce new problems that the model has not encountered during its training.
According to Google, the mathematical Gemini 1.5 Pro’s performance on the MATH benchmarks “is on par with a human-expert performance,” with the model solving significantly more problems on the AIME benchmarks and improving scores on the others when compared to the standard, non mathematical Gemini 1.5 Pro.
Google also shared some examples of the problems that Gemini 1.5 Pro had solved. According to its paper, these are those problems that “Gemini 1.5 Pro, GPT-4 Turbo, and all previous Gemini models answer[ed] incorrectly,” which ends up increasing the bar of performance for the Google product.
Out of the three examples that it shared, two were solved by the math specific Gemini 1.5 Pro while one was incorrectly solved by the standard Gemini 1.5 Pro variant. The problems typically require the solver to recall basic mathematical formulations in algebra and rely on their substation along with other mathematical rules to derive the correct answers.
Apart from the problem, Google also shares important details for Gemini 1.5 Pro’s benchmarks. These place the math specific variant ahead of GPT-4 Turbo and Amazon’s Claude in all five benchmark scores shared.
According to Google, its math specialized variant is able to achieve “an accuracy of 80.6% on the MATH benchmark from a single sample, and an accuracy of 91.1% when sampling 256 solutions and selecting a candidate answer (rm@256),” a feat that places it on an equivalent level with a human expert. In fact, the 91.1% score in the math model is significantly higher than the ‘SOTA’ (state of the art) score of a mere 6.9% just three years ago, according to Jeff Dean, chief scientist at Google Deep Mind.