Google on Thursday opened new fronts in the GenAI technology war, releasing grounding and context caching capabilities for its multimodal Gemini large language models to better ensure accuracy of results and use less compute power.
The tech giant also launched Imagen 3, the latest version of its diffusion imaging model, in early preview for users of the Vertex AI platform. It brings faster processing, better prompt comprehension and digital watermarking. Google also made generally available the Gemini 1.5 Flash model with a 1 million-token context window.
The moves come amid a frenzy of technology advances as  Google and rival vendors Microsoft and its partner OpenAI, Meta, AWS and smaller independent AI vendors vie for supremacy in the surging GenAI market.
LLM output evidence
Grounding, or providing citations or links to sources underlying LLM outputs, has become a  buzzword in the GenAI universe as vendors and users look for ways to reduce or eliminate the hallucinations, or inaccuracies, LLMs are prone to.
On the grounding front, Google has jumped ahead of its main GenAI competitors, said Andy Thurai, an analyst with Constellation Research.
“With grounding, context caching and size, they announced some things that others aren’t thinking about,” Thurai said. “They are urging others to catch up.”
Google’s grounding approach begins with Google Search. With the grounding feature, Google provides a percentage-based accuracy score.
“This assumes that your Google search results are accurate, which Google says they are,” Thurai said. “But if the search results are bad, then your model output will be bad.”
Thurai said he has higher hopes for third-party grounding, expected to be available on Vertex AI later this year with Moody’s for financial data, Thomson Reuter for news and Zoominfo for company data. High-fidelity mode grounding, now in experimental preview powered by a version of Gemini 1.5 Flash, will let users pick their own data confirmation source.
It’s likely that grounding will become an industry standard method for reducing and eliminating LLMs inaccuracies, according to some observers.
“If we don’t ground and try to fix the hallucinations, then AI will not be successful,” said Sanjeev Mohan, principal and founder of SanjMo, a data trend advisory firm.
GenAI competition
Bursting into public view with OpenAI’s introduction of ChatGPT in November 2022, the GenAI race has become a monthly leapfrog exercise, with the vendors striving to outdo each other on LLM features, size, power and other attributes.
AWS has an event in New York City on July 10 at which it’s expected to roll out GenAI releases to try to catch up with Google and OpenAI. OpenAI made its own splash last month with GPT-4o and the acquisition of streaming database vendor Rockset last week and is expected to make another big move soon.
Meanwhile, smaller AI vendors are touting the virtues of non-compute intensive, highly customizable small language models.
At a media and analyst briefing on June 26, Google Cloud CEO Thomas Kurian touted the Gemini Flash 1.5 model, aimed at mid-market enterprises looking for speed, affordability and a large context window, as superior to OpenAI’s GPT-3.5.
Google’s Gemini 1.5 Pro model has what is accepted to be the industry’s biggest context window for entering prompt information into an LLM: 2 million tokens.
“The generally available Gemini 1.5 Flash is the fastest model at the best price-to-performance option on the market,” Kurian said.
LLM memory
One of the elements Google promotes as enabling lower costs to run large-scale LLM processing is context caching.
In public preview for both the 1.5 Pro and Flash models in APIs on Vertex AI, context caching is a form of in-LLM memory that enables users to store lengthy context information in the context window without having to re-input it for each prompt.
Caching lowers costs by reducing token usage and boosts speed by efficiently re-using previously stored context such as long documents or video footage, according to Google.
Some models from other vendors also provide forms of caching to deal with context window size limitations.
But Google’s advantage in the competition for enterprise customers with its GenAI products is its control of huge amounts of data, Mohan said.
“They have a big installed base, and they have a massive amount of data to train and test their models on,” he said. “For example, they’ve got four products with a billion users every day: Ads, Gmail, YouTube and Google Search.”
In addition to its own Gemini, Imagen and Gemma models, Google has pointedly pursued a multi-model strategy, with more than 150 third-party LLMs and foundation models in Vertex AI’s Model Garden. AWS has taken a similar route with the Amazon Bedrock GenAI platform.
On Thursday, Google added independent GenAI vendor Anthropic’s new Claude 3.5 Sonnet model, released on June 20, to Vertex AI.
Also, Google said it will add the popular Mistral Small, Mistral Large and Codestral LLMs to Model Garden later this summer.
“What customers want is optionality. They want certain models that are expert and trained for certain things,” Mohan said.
At the media and analyst briefing, Google sought to show that some of its customers have moved beyond experimenting with GenAI applications to putting them into production.
Nick Reed, chief product officer at Moody’s, said the financial services giant has been using Google’s grounding technology to improve accuracy in credit ratings and the Gemini models’ large context windows to extract knowledge from big data sets and documents.
Shaun Sutner is senior news director for TechTarget Editorial’s information management team, driving coverage of artificial intelligence, unified communications, analytics and data management technologies. He is a veteran journalist with more than 30 years of news experience.
Â