This has been a wild year for AI. If you’ve spent much time online, you’ve probably come across images generated by AI systems like DALL-E 2 or Stable Diffusion, or jokes, essays, or other text written by ChatGPT, the latest incarnation of the great language model OpenAI GPT. -3.
Sometimes it’s obvious when an AI has created an image or piece of text. But increasingly, the output these models generate can easily fool us into thinking it was made by a human. And big language models, in particular, are confident shit: they create text that sounds right, but may in fact be full of falsehoods.
While this doesn’t matter if it’s just a bit of fun, it can have serious consequences if AI models are used to offer unfiltered health advice or provide other forms of important information. AI systems could also make it stupidly easy to produce reams of misinformation, abuse and spam, distorting the information we consume and even our sense of reality. It can be especially worrying around elections, for example.
The proliferation of these large, easily accessible language models raises an important question: How will we know whether what we read online is written by a human or a machine? I just published a story about the tools we currently have for detecting AI-generated text. Spoiler alert: Today’s detection toolkit is woefully inadequate against ChatGPT.
But there is a more serious long-term implication. Maybe we’re witnessing, in real time, the birth of a shitty snowball.
Big language models are trained on datasets that are created by scraping the Internet for text, including all the toxic, silly, fake, and malicious things humans have written online. Finished AI models regurgitate these falsehoods as fact, and their output is spread everywhere online. Tech companies are scraping the internet again, collecting AI-written text that they use to train bigger, more convincing models, which humans can then use to generate even more nonsense before being scraped over and over again, even.
This problem, the AI feeding on itself and producing increasingly polluted output, extends to the images. “The Internet is now forever polluted with images made by AI,” Mike Cook, an AI researcher at King’s College London, told my colleague Will Douglas Heaven in his new paper on the future of generative models of i.a.
“The images we took in 2022 will be part of any model made from now on.”