Monday, December 23, 2024

Google tool makes AI-generated writing easily detectable

Must read

The probability that one word will follow another can be used to create a watermark for AI-generated text

Vikram Arun/Shutterstock

Google has been using artificial intelligence watermarking to automatically identify text generated by the company’s Gemini chatbot, making it easier to distinguish AI-generated content from human-written posts. That watermark system could help prevent misuse of the AI chatbots for misinformation and disinformation – not to mention cheating in school and business settings.

Now, the tech company is making an open-source version of its technique available so that other generative AI developers can similarly watermark the output from their own large language models, says Pushmeet Kohli at Google DeepMind, the company’s AI research team, which combines the former Google Brain and DeepMind labs. “While SynthID isn’t a silver bullet for identifying AI-generated content, it is an important building block for developing more reliable AI identification tools,” he says.

Independent researchers voiced similar optimism. “While no known watermarking method is foolproof, I really think this can help in catching some fraction of AI-generated misinformation, academic cheating and more,” says Scott Aaronson at The University of Texas at Austin, who previously worked on AI safety at OpenAI. “I hope that other large language model companies, including OpenAI and Anthropic, will follow DeepMind’s lead on this.”

In May of this year, Google DeepMind announced that it had implemented its SynthID method for watermarking AI-generated text and video from Google’s Gemini and Veo AI services, respectively. The company has now published a paper in the journal Nature showing how SynthID generally outperformed similar AI watermarking techniques for text. The comparison involved assessing how readily responses from various watermarked AI models could be detected.

In Google DeepMind’s AI watermarking approach, as the model generates a sequence of text, a “tournament sampling” algorithm subtly nudges it toward selecting certain word “tokens”, creating a statistical signature that is detectable by associated software. This process randomly pairs up possible word tokens in a tournament-style bracket, with the winner of each pair being determined by which one scores highest according to a watermarking function. The winners move through successive tournament rounds until just one remains – a “multi-layered approach” that “increases the complexity of any potential attempts to reverse-engineer or remove the watermark”, says Furong Huang at the University of Maryland.

A “determined adversary” with huge amounts of computational power could still remove such AI watermarks, says Hanlin Zhang at Harvard University. But he described SynthID’s approach as making sense given the need for scalable watermarking in AI services.

The Google DeepMind researchers tested two versions of SynthID that represent trade-offs between making the watermark signature more detectable, at the expense of distorting the text typically generated by an AI model. They showed that the non-distortionary version of the AI watermark still worked, without noticeably affecting the quality of 20 million Gemini-generated text responses during a live experiment.

But the researchers also acknowledged that the watermarking works best with longer chatbot responses that can be answered in a variety of ways – such as generating an essay or email – and said it has not yet been tested on responses to maths or coding problems.

Both Google DeepMind’s team and others described the need for additional safeguards against misuse of AI chatbots – with Huang recommending stronger regulation as well. “Mandating watermarking by law would address both the practicality and user adoption challenges, ensuring a more secure use of large language models,” she says.

Topics:

Latest article