Theft is not fair use
One of AI’s original sins (there are many) is scraping every corner of the Internet for training data, sucking up text, photos, video, music and anything they can find online to feed the large language models tech companies are racing to develop. Intellectual property rights and copyrights be damned, in true Silicon Valley style, AI companies are moving fast and breaking things. They need the data, so they take it and claim it’s transformative fair use. It’s not; it’s theft.

I’m a 2025 John S. Knight Journalism Fellow at Stanford studying the impacts of AI in the photojournalism industry. Tech companies’ willful pillaging of images off the internet to train their AI models without compensating photojournalists or the news organizations that employ them is one of the negative impacts.
Dozens of lawsuits have been filed by news publishers, the entertainment industry, authors, photographers and other creatives who objected to tech companies pickpocketing their copyrights under the guise of fair use. In February, Thomson Reuters scored an important first victory for copyright holders when a federal judge in Delaware ruled against Ross Intelligence, a now defunct AI legal-research startup that was claiming fair use of Reuters’ content.
The Reuters court decision put AI companies on notice that the transformative fair use argument they’ve bet at least part of their multi-billion dollar valuations on might be in jeopardy. They know they face a potentially huge liability if more court cases are similarly decided. That’s why, when the Trump administration began developing and seeking input on the White House’s new Artificial Intelligence Action Plan, Open AI and Google filed comments requesting tech companies be granted free use of copyrighted material for AI training purposes.
Last week, a group of copyright expert law professors filed an amicus brief that sides with authors suing Meta for copyright infringement. In the lawsuit, the authors accuse Meta of intentionally removing copyright management information to conceal their theft. The law professors poke holes in Meta’s claim of fair use saying “training use is also not ‘transformative’ because its purpose is to enable the creation of works that compete with the copied works in the same markets — a purpose that, when pursued by a for-profit company like Meta, also makes the use undeniably “commercial.”
However, AI’s copyright liability does not end with its use of scraped training data. Generative AI is also liable for providing copyright-infringing outputs in response to the prompts given by its users. A lawsuit filed by The New York Times cites examples of problematic ChatGPT outputs that copy large parts of text from Times articles.

In another lawsuit, Getty Images accuses Stability AI of using millions of Getty’s copyrighted images in its training data. As proof, Getty submitted multiple images where a distorted version of the Getty watermark is included in the output from Stability AI.

Seeing the NYT and Getty lawsuits made me curious to see if images created by me and my fellow photographers at the St. Louis Post-Dispatch had been appropriated as AI training data. An AI model evaluation I did took me only six prompts to get ChatGPT to crank out a copyright infringement of my co-worker Robert Cohen’s iconic photo from the Ferguson uprising in 2014. The copyright infringement was so blatant that during a presentation I gave about AI, journalism and disinformation to about 100 people at the University of Richmond, the audience audibly gasped when I showed the offending output side by side with the original image (see side by side images at the top of this piece).
Many news organizations, including AP, The Wall Street Journal, Hearst and Lee Enterprises (the parent company of St. Louis Post-Dispatch, from where I’m on leave) have struck licensing deals with AI companies that allow their copyrighted content to be used as training data. It’s hard to say for sure because there is little public disclosure of the deals, but I fear news publishers have signed deals that undervalue their content. Courtney C. Radsch’s October 2024 article in Washington Monthly, “AI Needs Us More Than We Need It,” summarizes my concerns about these deals. “Without a constant stream of high-quality, human-made information, artificial intelligence models become useless. That’s why journalists and other content creators have more leverage over the future than they might know,” she wrote.
Tech companies have a history of taking advantage of legacy news organizations that are desperate for revenue and are making deals with short-term cash infusions but little long-term benefit. I fear AI companies will act as vampires, draining news organizations of their valuable content to train their new AI models and then ride off into the sunset with their multi-billion dollar valuations while the news organizations continue to teeter on the brink of bankruptcy. It wouldn’t be the first time tech companies out-maneuvered (online advertising) or lied to news organizations (pivot to video).
In my first blog post as a JSK Fellow, “Seeing is no longer believing: Artificial Intelligence’s impact on photojournalism,” I wrote about Content Credentials and its potential to help build public trust in news photography. Content Credentials also features a box that can be checked that asks AI companies not to train on images. While Content Credentials won’t stop unethical copyright violations from happening, it will provide copyright holders with the possibility of tracing who has stolen their work.
Sam Altman, CEO of OpenAI, openly complained that DeepSeek, a Chinese competitor to ChatGPT, may have taken ChatGPT’s IP without their permission. I feel your pain Sam. I’ve also been a longtime advocate for photographers protecting their copyrights from corporate exploitation.
News organizations, photographers, writers and all copyright holders should be fairly compensated for the billions in valuation AI companies have achieved by stealing the content.
To all the tech companies or Silicon Valley bros planning to move fast and break things by infringing upon copyrights because they think they’re protected by fair use, I suggest they seek guidance from Mike Monteiro’s YouTube video, where he references this classic scene from Goodfellas. “F*ck you. Pay me.”