Google’s lead privacy regulator in the European Union has opened an investigation into whether or not it has complied with the bloc’s data protection laws in relation to use of people’s information for training generative AI.
Specifically it’s looking into whether the tech giant needed to carry out a data protection impact assessment (DPIA) in order to proactively consider the risks its AI technologies might pose to the rights and freedoms of individuals whose information was used to train the models.
Generative AI tools are infamous for producing plausible-sounding falsehoods. That tendency, combined with an ability to serve up personal information on demand, creates a lot of legal risk for their makers. Ireland’s Data Protection Commission (DPC), which oversees Google’s compliance with the bloc’s General Data Protection Regulation (GDPR), has powers to levy fines of up to 4% of Alphabet (Google’s parent entity) global annual turnover for any confirmed breaches.
Google has developed several generative AI tools, including a whole family of general purpose large language models (LLMs) which it’s branded Gemini (formerly Bard). It uses the technology to power AI chatbots, including to enhance web search. Underlying these consumer-facing AI tools is a Google LLM called PaLM2, which it launched last year at its I/O developer conference.
How Google developed this foundational AI model is what the Irish DPC says it’s investigating, under Section 110 of Ireland’s Data Protection Act 2018 which transposed the GDPR into national law.
The training of GenAI models typically requires vast amounts of data, and the types of information that LLM makers have acquired, as well as how and where they got it, is being increasingly scrutinized in relation to a range of legal concerns, including copyright and privacy.
In the latter case, information used as AI training fodder that contains the personal information of EU people’s is subject to the bloc’s data protection rules, whether it was scraped off the public internet or directly acquired from users. This is why a number of LLM have already faced questions — and some GDPR enforcement — related to privacy compliance, including OpenAI, the maker of GPT (and ChatGPT); and Meta, which develops the Llama AI model.
Elon Musk owned X has also attracted GDPR complaints and the DPC’s ire over use of people’s data for AI training — leading to a court proceeding and an undertaking by X to limit its data processing but no sanction. Although X could still face a GDPR penalty if the DPC determines its processing of user data to train its AI tool Grok breached the regime.
The DPC’s DPIA probe on Google’s GenAI is the latest regulatory action in this area.
“The statutory inquiry concerns the question of whether Google has complied with any obligations that it may have had to undertake an assessment, pursuant to Article 35 of the General Data Protection Regulation (Data Protection Impact Assessment), prior to engaging in the processing of the personal data of EU/EEA data subjects associated with the development of its foundational AI Model, Pathways Language Model 2 (PaLM 2),” the DPC wrote in a press release.
It points out that a DPIA can be of “crucial importance in ensuring that the fundamental rights and freedoms of individuals are adequately considered and protected when processing of personal data is likely to result in a high risk.”
“This statutory inquiry forms part of the wider efforts of the DPC, working in conjunction with its EU/EEA [European Economic Area] peer regulators, in regulating the processing of the personal data of EU/EEA data subjects in the development of AI models and systems,” the DPC added, referencing ongoing efforts by the bloc’s network of GDPR enforcers to reach some sort of consensus on how best to apply the privacy law on GenAI tools.
Google has been contacted for a response to the DPC’s enquiry.