Meta, the parent company of Facebook, is reportedly considering paying news organizations to access their content in order to better train its AI language models. This move aims to enhance Meta’s generative AI tools, including Meta AI, making them more effective and competitive against similar tools offered by Google and Microsoft. According to a report by Business Insider, which cites two sources familiar with the matter, internal discussions are ongoing within Meta about potentially striking deals with news publishers for more extensive access to news, photo, and video content.
One of the sources mentioned that Meta might eventually be compelled to pay for such access. However, Meta has not yet approached any news outlets about licensing content, and any agreements for data access would be distinct from previous deals where Meta compensated publishers to host links on its platforms.
Meta’s AI model training practices involve using its own data. CEO Mark Zuckerberg has claimed that the company possesses a larger dataset for training its Llama models than Common Crawl, a widely used collection of scraped web data. Despite this, there are internal concerns regarding the quality of Meta’s proprietary data. Posts and comments on Facebook and Instagram, for instance, may not provide the high-quality training data needed for generative AI chatbots and search tools to produce quality outputs, unlike more structured sources such as books, news articles, and essays.
This consideration comes in the wake of OpenAI‘s recent multi-year agreement with News Corp, a media giant that owns publications such as The Wall Street Journal, MarketWatch, and The Sun. The deal, reportedly valued at $250 million over five years, grants OpenAI access to content from over a dozen news publishers to train its AI models. This partnership highlights the growing trend and necessity of utilizing high-quality content to train advanced AI models, underscoring the competitive landscape Meta is navigating.