Why Google’s AI Overviews gets things wrong (2024)

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what's coming next. You can read more here.

When Google announced it was rolling out its artificial-intelligence-powered search feature earlier this month, the company promised that “Google will do the googling for you.” The new feature, called AI Overviews, provides brief, AI-generated summaries highlighting key information and links on top of search results.

Unfortunately, AI systems are inherently unreliable. Within days of AI Overviews’ release in the US, users were sharing examples of responses that were strange at best. It suggested that users add glue to pizza or eat at least one small rock a day, and that former US president Andrew Johnson earned university degrees between 1947 and 2012, despite dying in 1875.

On Thursday, Liz Reid, head of Google Search, announced that the company has been making technical improvements to the system to make it less likely to generate incorrect answers, including better detection mechanisms for nonsensical queries. It is also limiting the inclusion of satirical, humorous, and user-generated content in responses, since such material could result in misleading advice.

But why is AI Overviews returning unreliable, potentially dangerous information? And what, if anything, can be done to fix it?

How does AI Overviews work?

In order to understand why AI-powered search engines get things wrong, we need to look at how they’ve been optimized to work. We know that AI Overviews uses a new generative AI model in Gemini, Google’s family of large language models (LLMs), that’s been customized for Google Search. That model has been integrated with Google’s core web ranking systems and designed to pull out relevant results from its index of websites.

Most LLMs simply predict the next word (or token) in a sequence, which makes them appear fluent but also leaves them prone to making things up. They have no ground truth to rely on, but instead choose each word purely on the basis of a statistical calculation. That leads to hallucinations. It’s likely that the Gemini model in AI Overviews gets around this by using an AI technique called retrieval-augmented generation (RAG), which allows an LLM to check specific sources outside of the data it’s been trained on, such as certain web pages, says Chirag Shah, a professor at the University of Washington who specializes in online search.

Once a user enters a query, it’s checked against the documents that make up the system’s information sources, and a response is generated. Because the system is able to match the original query to specific parts of web pages, it’s able to cite where it drew its answer from—something normal LLMs cannot do.

One major upside of RAG is that the responses it generates to a user’s queries should be more up to date, more factually accurate, and more relevant than those from a typical model that just generates an answer based on its training data. The technique is often used to try to prevent LLMs from hallucinating. (A Google spokesperson would not confirm whether AI Overviews uses RAG.)

So why does it return bad answers?

But RAG is far from foolproof. In order for an LLM using RAG to come up with a good answer, it has to both retrieve the information correctly and generate the response correctly. A bad answer results when one or both parts of the process fail.

In the case of AI Overviews’ recommendation of a pizza recipe that contains glue—drawing from a joke post on Reddit—it’s likely that the post appeared relevant to the user’s original query about cheese not sticking to pizza, but something went wrong in the retrieval process, says Shah. “Just because it’s relevant doesn’t mean it’s right, and the generation part of the process doesn’t question that,” he says.

Similarly, if a RAG system comes across conflicting information, like a policy handbook and an updated version of the same handbook, it’s unable to work out which version to draw its response from. Instead, it may combine information from both to create a potentially misleading answer.

“The large language model generates fluent language based on the provided sources, but fluent language is not the same as correct information,” says Suzan Verberne, a professor at Leiden University who specializes in natural-language processing.

The more specific a topic is, the higher the chance of misinformation in a large language model’s output, she says, adding: “This is a problem in the medical domain, but also education and science.”

According to the Google spokesperson, in many cases when AI Overviews returns incorrect answers it’s because there’s not a lot of high-quality information available on the web to show for the query—or because the query most closely matches satirical sites or joke posts.

The spokesperson says the vast majority of AI Overviews provide high-quality information and that many of the examples of bad answers were in response to uncommon queries, adding that AI Overviews containing potentially harmful, obscene, or otherwise unacceptable content came up in response to less than one in every 7 million unique queries. Google is continuing to remove AI Overviews on certain queries in accordance with its content policies.

It’s not just about bad training data

Although the pizza glue blunder is a good example of a case where AI Overviews pointed to an unreliable source, the system can also generate misinformation from factually correct sources. Melanie Mitchell, an artificial-intelligence researcher at the Santa Fe Institute in New Mexico, googled “How many Muslim presidents has the US had?’” AI Overviews responded: “The United States has had one Muslim president, Barack Hussein Obama.”

While Barack Obama is not Muslim, making AI Overviews’ response wrong, it drew its information from a chapter in an academic book titled Barack Hussein Obama: America’s First Muslim President? So not only did the AI system miss the entire point of the essay, it interpreted it in the exact opposite of the intended way, says Mitchell. “There’s a few problems here for the AI; one is finding a good source that’s not a joke, but another is interpreting what the source is saying correctly,” she adds. “This is something that AI systems have trouble doing, and it’s important to note that even when it does get a good source, it can still make errors.”

Can the problem be fixed?

Ultimately, we know that AI systems are unreliable, and so long as they are using probability to generate text word by word, hallucination is always going to be a risk. And while AI Overviews is likely to improve as Google tweaks it behind the scenes, we can never becertain it’ll be 100% accurate.

Google has said that it’s adding triggering restrictions for queries where AI Overviews were not proving to be especially helpful and has added additional “triggering refinements” for queries related to health. The company could add a step to the information retrieval process designed to flag a risky query and have the system refuse to generate an answer in these instances, says Verberne. Google doesn’t aim to show AI Overviews for explicit or dangerous topics, or for queries that indicate a vulnerable situation, the company spokesperson says.

Techniques like reinforcement learning from human feedback, which incorporates such feedback into an LLM’s training, can also help improve the quality of its answers.

Similarly, LLMs could be trained specifically for the task of identifying when a question cannot be answered, and it could also be useful to instruct them to carefully assess the quality of a retrieved document before generating an answer, Verbene says: “Proper instruction helps a lot!”

Although Google has added a label to AI Overviews answers reading “Generative AI is experimental,” it should consider making it much clearer that the feature is in beta and emphasizing that it is not ready to provide fully reliable answers, says Shah. “Until it’s no longer beta—which it currently definitely is, and will be for some time— it should be completely optional. It should not be forced on us as part of core search.”

FAQs

Why Google’s AI Overviews gets things wrong? ›

According to the Google spokesperson, in many cases when AI Overviews returns incorrect answers it's because there's not a lot of high-quality information available on the web to show for the query—or because the query most closely matches satirical sites or joke posts.

Read On ›

What is the fiasco in Google AI overview? ›

The “Rocks and Glue” Fiasco: Google's AI Overview mistakenly recommended consuming rocks and glue, demonstrating the risks of AI hallucinations. A Widespread LLM Challenge: This incident isn't unique to Google; any large language model can fall prey to factual errors without proper guardrails.

Discover More Details ›

Can I turn off AI overviews on Google? ›

AI Overviews are part of Google Search like other features, such as knowledge panels, and can't be turned off.

What question did Google AI get wrong? ›

Probably the most notorious example of an AI Overview fail based on not understanding humor is the search results for the query “cheese not sticking to pizza,” a common enough issue when trying to make your own pie. Google's AI suggested adding nontoxic glue to the sauce — which obviously is a bad idea.

See Details ›

Why does Google show AI overview? ›

AI Overviews appear in Google Search results when our systems determine that generative responses can be especially helpful — for example, when you want to quickly understand information from a range of sources, including information from across the web and Google's Knowledge Graph.

Find Out More ›

Does Google punish AI content? ›

Google does not penalize AI content according to its policies. However, Google recently applied manual actions on websites using spammy, AI-generated content created by large language models (LLMs). There's also evidence that the algorithm can detect some AI-written content.

Tell Me More ›

How do I stop the AI overview on Google? ›

From there:

Open a new tab and search for anything in Google. This step can't be skipped.
Next, tap on the three dots menu in the bottom right corner.
Choose Settings > Search Engine. ...
Once you select Google Web you'll no longer see the AI overviews in search results through the main search bar.

May 24, 2024

Show Me More ›

How do I get rid of AI overview in Chrome? ›

To start, open the "Chrome" browser, and ensure you're signed in with your Google account. Then, navigate to this Google Labs page, and disable the "Turn this experiment on or off" option. You're all set! Take a moment, and try searching queries on Google without encountering the AI Overview.

Explore More ›

How do I stop AI from being detected? ›

Rephrasing sentences using different words while maintaining their original meaning can avoid AI content detection. By using an easy to understand writing style that reflects personal anecdotes and experiences, you can help evade AI detection in your content.

What are the problems with Google AI overview? ›

Lack of Transparency: The AI search overview feature lacks transparency in how it generates and prioritizes information. Users are left in the dark about the sources and algorithms used, making it difficult to assess the reliability of the information provided.

Show Me More ›

Are Google AI overviews safe? ›

Ultimately, we know that AI systems are unreliable, and so long as they are using probability to generate text word by word, hallucination is always going to be a risk. And while AI Overviews is likely to improve as Google tweaks it behind the scenes, we can never be certain it'll be 100% accurate.

Read The Full Story ›

Why does AI get things wrong? ›

Types of AI Mistakes:

This can occur due to ambiguities in language, user input that deviates from trained models, or insufficient data training to cover all possible expressions of intent.

See Details ›

How accurate is the Google AI overview? ›

But they aren't always accurate Google has introduced new AI Overviews to make searching more intuitive. NPR's Bobby Allyn speaks to University of Pennsylvania professor Ethan Mollick about why some of the results have been inaccurate — and ocassionally absurd.

Get More Info Here ›

Can I opt out of Google AI? ›

You can't disable AI Overviews, but you can make it extensive.

How to get rid of AI results on Google? ›

You can also set up a custom search engine in your browser that uses a simple string, "udm=14," to tell Google to omit the AI results.

What is the controversy behind AI? ›

Often, controversies in AI relevant to participation lead to ex-post attempts to fix problems that AI applications cause. These include concerns around bias, privacy, value alignment, safety, existential risk, workforce disruption, and de-skilling.

View Details ›

What is the Google DeepMind controversy? ›

Google and DeepMind Technologies are being sued for using NHS health data belonging to 1.6 million people without their consent.

Why is AI a threat to Google? ›

Users may seek information elsewhere if AI platforms secure exclusive deals, diminishing Google's value proposition.

Learn More ›

What is the Google data collection controversy? ›

Google has agreed to purge billions of records containing personal information collected from more than 136 million people in the U.S. surfing the internet through its Chrome web browser. The massive housecleaning comes as part of a settlement in a lawsuit accusing the search giant of illegal surveillance.

Discover More Details ›