A hot potato: AI data has become a treasured resource for companies as they seek to train new models. With many publicly available sources rapidly running dry, firms are turning to private datasets protected by privacy laws. To give themselves cover, these companies are inserting small changes into their privacy statements to allow them to use the data in this way.
Earlier this year, the Federal Trade Commission warned that companies would be sorely tempted to change the terms and conditions of their privacy statements to allow them to use their customers’ data to train AI models. To avoid backlash from users concerned about their privacy, companies may try to make these changes quietly and with little fanfare, the commission said. However, such actions would be illegal, it added, noting that any firm reneging on its user privacy commitments risks running afoul of the law.
“It may be unfair or deceptive for a company to adopt more permissive data practices – for example, to start sharing consumers’ data with third parties or using that data for AI training – and to only inform consumers of this change through a surreptitious, retroactive amendment to its terms of service or privacy policy,” it said in no uncertain terms.
But this is precisely what is happening, according to an analysis by The New York Times.
As companies seek data to train their AI models, they are increasingly turning to data protected by privacy laws. To give themselves legal cover, they are carefully rewriting their terms and conditions to include terms like “artificial intelligence,” “machine learning,” and “generative AI.”
Google is just one example. Last July, it made several key tweaks to its privacy policy. It now states that Google uses publicly available information to help train its language AI models and develop products like Google Translate, Bard (now Gemini), and Cloud AI capabilities.
Google explained the change to the Times saying it “simply clarified that newer services like Bard (now Gemini) are also included. We did not start training models on additional types of data based on this language change.”
Last month, Adobe undertook a similar action and faced customer backlash over the changes. A popup notified users of the update, suggesting that the company could access and claim ownership of content created with its Creative Suite to train AI models, among other purposes. Many users were furious, especially upon realizing they could not access their projects without immediately agreeing to the confusing new terms. This led to a wave of canceled subscriptions and forced Adobe to issue a clarification about the updated terms.
In May, Meta informed its Facebook and Instagram users in Europe that it would use publicly available posts to train its AI. However, after complaints from the European Center for Digital Rights in 11 European countries, Meta paused these plans.
It is easier for Meta to gather data from its US users due to weaker consumer protections and a patchwork of state and federal oversight agencies, including the FTC.
It remains to be seen what actions the commission will take as more privacy policies are changed to incorporate AI data training.