Researchers from Google’s DeepMind HQ in Britain have invented a new, seamless way to label AI-generated text. Publishing their findings in Nature magazine, the researchers claim it can help identify when writing was first generated by Google’s own AI model – Gemini. While it gives some industries peace of mind, it can also stop AI models from cannibalizing their own content and experiencing model collapse.
Google’s interest in a watermarking solution shouldn’t come as a surprise. Google doesn’t just operate its own AI – Gemini – but it’s also on the front line of judging content quality from human and AI sources through its search engine.
To that end, they are always refining their search algorithms to better judge pages on writing quality and user intent. Using those, Google directs users to the content they want to see by using keywords and search trends. For example, it associates keywords like ‘buy’ with product-selling sites or ‘play’ with iGaming sites. Then, a ‘play’ search for online roulette will return a site like PeerGame where digital roulette, blackjack, and other games are on offer. This works quite straightforwardly for sites that offer services, like e-commerce and iGaming sites. However, for sites that deal with the written word, Google needs to be more stringent about what that written content says and how it has been generated.
At first, Google’s stance on AI-generated writing leaned negative out of an abundance of caution. However, following that common SEO mantra – content is king – they laxed their Google Search guidelines to accept AI-generated content if it’s still high-quality material. This seems like a sensible approach, rewarding those who use AI to level up skills and services that they already offer.
Shortly after OpenAI first unveiled generative AI to the world, Google revealed their own AI service – Bard. One year and one renaming later, the new Google Gemini is now available on desktop, on Android mobiles, and it’s even built into their search engine now.
Like every large language model (LLM), it can also generate text. Now DeepMind researchers in London have published a seamless, invisible way to watermark Gemini’s writing output, so others can tell when content is AI generated. It isn’t the first attempt at watermarking AI writing, but it’s the first that can be applied at-scale, in real-time, with no performance loss from Gemini. This tool, called SynthID-Text, first debuted in this Nature magazine study.
The interest in watermarking AI-generated text comes from a few places. While it is a great way to cut down on misinformation, it’s also a self-preservation strategy because it can stop AI-generated text being fed into other AI models, creating something called recursive data. When an AI model is trained on recursive data, it reduces the quality of that model’s output. Then the model loses its ability to produce human-quality output, and in serious cases the AI can unravel entirely. Recursive data poses a big threat to future LLMs, but a method of parsing AI-generated writing can solve this problem.
Google has made the SynthID-Text tool open, hoping that “other AI-Model developers pick this up and integrate it with their own systems.” Assuming other developers don’t make their own tools, it’s highly likely that Google’s solution will become a standard way of watermarking AI-generated output in the future.