Tuesday, November 5, 2024

OpenAI, Google ink deals to augment AI efforts with news

Must read

OpenAI and Google on Thursday independently announced fresh collaborations with major publishers as they work to expand paid access to information used by their AI products and services.

OpenAI is now working with Time Magazine under a multi-year agreement, the terms of which weren’t disclosed, that grants the machine-learning mega-lab access to the publisher’s assets – from the present way back to 1923.

“We’re partnering with Time to make it easier for people to access news content through our AI tools, and to support reputable journalism by providing proper attribution to original sources,” OpenAI COO Brad Lightcap declared in a canned statement.

It isn’t entirely clear how Time’s content will be integrated into OpenAI’s products. The AI upstart simply noted the catalog will “enhance its products and display in response to user inquiries – featuring a citation and link back to the original source on Time.com.”

This suggests the use of retrieval augmented generation (RAG) to parse Time articles, in order to generate more accurate answers to ChatGPT users’ queries.

We covered RAG in depth in a recent hands-on tutorial. In a nutshell, the technique ingests content, converts it into an embedding model and feeds it into a vector database that a large language model, such as GPT-4, can use to generate answers. In a sense RAG is a bit like sending your LLM to the library to research a topic – or in the case of this deal, a newsstand.

There’s nothing in OpenAI’s announcement to suggest it won’t integrate Time’s archives into its training datasets. We’ve asked OpenAI and Time for comment.

There’s also no word on how much cash is changing hands, but the publisher will receive access to OpenAI’s technology to develop “new products for its audiences.” Additionally, the publication will get the opportunity to share its input on how to “refine and enhance the delivery of journalism in ChatGPT and other OpenAI products.”

Exactly how much sway Time journalists will actually have over OpenAI – or the use of Time content – is unclear.

The announcement comes weeks after OpenAI announced a partnership with Reddit and News Corp to feed its models with content.

Google does deals, too

Google, which announced a similar agreement with Reddit back in February, revealed on Thursday that it’s expanding its content partnerships to include several more content and data providers.

The search and ad giant’s collaborators include credit rating outfit Moody’s, investment research firm MSCI, Thomson Reuters, and Zoominfo. However, unlike OpenAI’s partnership with Time, Google specifically mentions the use of these sites as fodder for its RAG databases.

Google will offer its customers an option to “ground” their AI apps in facts sourced from the sites it’s signed. The hope is that more outside sources will reduce the likelihood of hallucinations – aka errors.

RAG should improve the quality of AI-generated results, but it isn’t a silver bullet. Even if a model doesn’t make something up, inaccurate source data remains a problem. As we saw last month with Google’s own AI search function, the quality of the answers relies heavily on the quality of the source. This led to the AI search presenting troll posts – including one suggesting adding “non-toxic” glue to pizza sauce to help the cheese stick – as fact.

However, as customer expectations of machine learning tools grow, so too has the demand for high quality datasets and – by extension – the need to protect curated sources of content from being hoovered up into the AI vacuum cleaner.

Earlier this week, Reddit said it would start making changes to its Robots.txt file to dissuade AI startups from scraping its site without paying for it – something we’ll note OpenAI and Google have already agreed to do.

These deals are also rather lucrative, with Google reportedly paying Reddit $60 million a year for the right to peruse its posts. ®

PS: That’s nice for Time. But the Center for Investigative Reporting, a nonprofit that operates Mother Jones, is now suing OpenAI and its backer Microsoft for ingesting its copyrighted content allegedly without permission nor compensation.

Latest article