Monday, November 4, 2024

Online publishers face a dilemma: Allow AI scraping from Google or lose search visibility

Must read

As the US government weighs its options following a landmark “monopolist” ruling against Google last week, online publications increasingly face a bleak future. (And this time, it’s not just because of severely diminished ad revenue.) Bloomberg reports that their choice now boils down to allowing Google to use their published content to produce inline AI-generated search “answers” or losing visibility in the company’s search engine.

The crux of the problem lies in the Googlebot, the crawler that scours and indexes the live web to produce the results you see when you enter search terms. If publishers block Google from using their content for the AI-produced answers you now see littered at the top of many search results, they also lose the privilege of including their web pages in the standard web results.

The catch-22 has led publications, rival search engines and AI startups to pin their hopes on the Justice Department. On Tuesday, The New York Times reported that the DOJ is considering asking a federal judge to break up parts of the company (spinning off sections like Chrome or Android). Other options it’s reportedly weighing include forcing Google to share search data with competitors or relinquishing its default search-engine deals, like the $18 billion one it inked with Apple.

Google uses a separate crawler for its Gemini (formerly Bard) chatbot. But its main crawler covers both AI Overviews and standard searches, leaving web publishers with little (if any) leverage. If you let Google scrape your content for AI Overview answers, readers may consider that the end of the matter without bothering to visit your site (meaning zero revenue from those potential readers). But if you block the Googlebot, you lose search visibility, which likely means significantly less short-term income and a colossal loss of long-term competitive standing.

iFixit CEO Kyle Wiens told Bloomberg, “I can block ClaudeBot [Anthropic’s crawler for its Claude chatbot] from indexing us without harming our business. But if I block Googlebot, we lose traffic and customers.”

A sample Google search query with an AI Overview answer.

Google

Another problem with combining the two is that it gives Google an immeasurable advantage over smaller AI startups. The company gets a plethora of free training data from publishers eager to remain visible in search. In contrast, AI companies are forced to pay publishers for access to their data — and, even then, it wouldn’t add up to the motherlode Google gets (essentially) for free.

From that perspective, it isn’t surprising to read that, according to Bloomberg, Google is spurning publishers that try to negotiate content deals. (Reddit has been the lone exception.) Why waste money on content deals when they get all the training data they want in exchange for the search results most publishers need to survive?

“Now you have a bunch of tech companies that are paying for content, they’re paying for access to that because they need it to be able to compete in any kind of serious way,” Alex Rosenberg, CEO of AI startup Tako Inc., told Bloomberg. “Whereas for Google, they don’t really have to do that.”

It comes down to leverage, which Google wields over desperate publishers. On top of the industry’s existing financial troubles (online ad revenue has fallen off a cliff over the past eight years), AdWeek reported in March that Google’s AI-generated search answers could lead to a 20 to 60 percent drop in organic search traffic.

The ball is now in the Justice Department’s court to figure out where Google — and, to an extent, the entire web — goes from here. Bloomberg’s full story is worth a read.

Latest article