Monday, December 23, 2024

Why are Google’s AI Overviews results so bad?

Must read

In the case of AI Overviews recommending a pizza recipe that contains glue—drawing from a joke post from Reddit—it’s likely that the post appeared relevant to the user’s original query about cheese not sticking to pizza, but that something went wrong in the retrieval process, says Shah. “Just because it’s relevant doesn’t mean it’s right, and the generation part of the process doesn’t question that,” he says.

Similarly, if a RAG system comes across conflicting information, like a policy handbook, and an updated version of the same handbook, it’s unable to work out which version to draw its response from. Instead, it may combine information from both to create a potentially misleading answer. 

“The large language model generates fluent language based on the provided sources, but fluent language is not the same as correct information,” says Suzan Verberne, a professor at Leiden University who specializes in natural language processing.

The more specific a topic is, the higher the chance of misinformation in a large language model’s output, she says, adding: “This is a problem in the medical domain, but also education and science.”

According to the Google spokesperson, in many cases when AI Overviews returns incorrect answers it’s because there’s not a lot of high quality information available on the web to show for the query—or that the query matches satirical sites or joke posts most closely.

The vast majority of AI Overviews provide high quality information, and that many of the examples of AI Overviews’ bad answers were in response to uncommon queries, they say, adding that the number of AI Overviews containing potentially harmful, obscene, or otherwise violative content accounted for less than one in every 7 million unique queries. Google is continuing to remove AI Overviews on certain queries in accordance with its content policies. 

It’s not just about bad training data

Although the pizza glue blunder is a good example of AI Overviews pointing to an unreliable source, AI Overviews can still generate misinformation from factually correct sources. Melanie Mitchell, an artificial intelligence researcher at the Santa Fe Institute in New Mexico, googled ‘How many Muslim presidents has the US had?’, to which AI Overviews responded: ‘The United States has had one Muslim president, Barack Hussein Obama.’ 

While Barack Obama himself is not Muslim, making AI Overviews’ response wrong, it drew its information from a chapter in an academic book titled Barack Hussein Obama: America’s First Muslim President? So not only did the AI system miss the entire point of the essay, it interpreted it in the exact opposite way, says Mitchell. “There’s a few problems here for the AI; one is finding a good source that’s not a joke, but another is interpreting what the source is saying correctly,” she adds. “This is something that AI systems have trouble doing, and it’s important to note that even when it does get a good source, it can still make errors.”

Latest article