“Nature abhors perpetual self-fertilization.”
– Charles Darwin (1868)
You know who’s really hungry? I mean, ravenously hungry?
There are at least two answers to that question. One answer is “me.” All the time. Once, when I was 19, I ate seven room-temperature leftover grilled-cheese sandwiches in one sitting. (There may have been marijuana involved.) But even 50 years later and stone sober, my appetite is not at peace with itself. Never mind me, however.
The other answer is: artificial intelligence.
In order to train itself to edge ever closer to human sentience — or at least be reliably analytical and predictive at super-human speed — AI demands to be fed mind-boggling quantities of data about every aspect of human activity and every example of human language it can swallow.
To say that data is the bread and butter of AI is to vastly understate the point. Have you ever been to an Old Country Buffet for the early-bird special? It’s like that, minus the creamed spinach and mobility scooters … and times a trillion. Because AI’s smorgasbord isn’t a downmarket dinner spot; it is an all-you-can-eat venue called “the internet.”
For the past three years, AI has been stuffing itself with every morsel it can scrape from everything ever published online, including text, audio, video and code. The feeding frenzy is mind-boggling. As we learned in an extraordinary package of stories in Saturday’s New York Times, the major players in AI are voraciously consuming all the internet has to offer, often on thin legal ice on the question of intellectual property. They justify scraping other people’s content, which is crammed into their data sets, as “fair use” under the pretext that it modifies the creative work for other forms of expression. As if I could enter your garage and steal your lawn mower, so long as I used it to decapitate half buried cats. But never mind their brutish behavior. That’s not my point here.
My point here is that Open AI (with partner Microsoft), Meta and Google are gorging themselves at such a pace, digesting hundreds of billions of web pages containing trillions of words, that within two years they will have run out. Run out of internet! It’s reminiscent of the Republic of Nauru, the South Pacific guano island that for the second half of the 20th century supplied much of the world’s supply of phosphates for the chemical industry. Huge earth moving machines scraped the crust off the small nation’s surface for shipping to the gaping maw of commerce. Then, as the millennium approached, Nauru ran out of seagull shit. “Yes, we have no bird guano, we have no bird guano today.”
That’s where AI is headed. Or if you prefer to stick with the buffet analogy: every chafing dish will be empty. (If you can’t imagine Twitter, Reddit and Substack as chafing dishes, I really can’t help you. I’m doing the best I can.)
In any event, empty is not a situation these companies can tolerate. Thus their solution: using AI content to train subsequent AI content. They call it “synthetic data.” I call it inbreeding.
As described by anthology editor Nancy Wilmsen Thornhill in The Natural History of Inbreeding and Outbreeding:
Inbreeding, the mating of close kin, and outbreeding, the mating of distant relatives or unrelated organisms, have long been important subjects to evolutionary biologists. Inbreeding reduces genetic diversity in a population, increasing the likelihood that genetic defects will become widespread and deprive a population of the diversity it may need to cope with its environment.
In humans, it leads to hemophilia, schizophrenia and birth defects, among other genetic maladies, which is why 150 years ago Darwin abhorred incestuous propagation as much as nature does. If you have ever read ChatGPT text or looked at its AI images, you’ve noticed small distortions of language and visual renderings — such as the fingers or nose in a “photograph.” Now, even though you possibly struggled with the chafing-dish metaphor, you can surely imagine how those distortions used as “content” will result in the machine learning of those distortions. Inevitably, recessive genes of corrupted content will mate with recessive genes of other corrupted content and the lifeblood of the AI universe will be like that of European royalty.
Or, as The Times so gently expressed the danger: “So if companies use A.I. to train A.I., they can end up amplifying their own flaws.”
It all gets me to thinking about Stanley Kubrick’s 1968 sci-fi masterpiece 2001: A Space Odyssey, and the movie’s arch villain, Hal — as in the HAL 9000 onboard computer, itself a sentient machine. Deep into the spaceship’s Jupiter mission in search of alien intelligence, HAL goes rogue and homicidal. But why? The exact cause of his corruption was left ambiguous.
What I’m wondering is, as someone who myself came to ruin on tepid grilled cheese, was it maybe something he ate?