New York Daily News and other outlets ask judge to reject OpenAI effort to keep deleting data

Lawyers for the Daily News, The New York Times and other news outlets suing ChatGPT’s parent company, OpenAI, have asked a Manhattan judge to reject an effort by the technology giant to continue deleting data that could prove it stole journalists’ work.

Manhattan Federal Magistrate Judge Ona Wang last month ordered OpenAI to preserve its output logs and any related information slated for deletion after the news outlets accused the tech company of permanently dumping enormous swaths of data, hindering their ability to prove AI products could circumvent paywalls to “plagiarize and regurgitate copyrighted content.”

OpenAI has asked Wang to vacate the order, arguing that continuing to store the data would be a “massive burden” and infringe on the privacy of users.

The news outlets say that the argument runs contrary to what OpenAI tells its users about being subject to retaining data if the law requires it. They have noted that the AI companies don’t deny the data deleted were pertinent to the lawsuit.

“What it does not dispute is that the output log data is relevant to the News Cases, which as OpenAI has long recognized, include infringement claims based on outputs generated by [OpenAI’s] models and products,” lawyers wrote Tuesday.

“Nor can it dispute that, as a highly sophisticated technology company that is currently valued at more than $300 billion, it has both the means and ability to preserve this concededly relevant data.”

The news outlets say that OpenAI has used every trick in the book to skirt accountability. In addition to the mass deletions, they have accused the tech company of installing filters “designed to make it harder” to elicit answers containing journalists’ copyrighted works.

“OpenAI’s preferred course of action to ‘protect its users’ data and privacy’ — immediately resuming mass deletions — will also, coincidentally, allow it to continue to destroy data that would show its liability for copyright infringement,” lawyers for the news outlets wrote.

Addressing privacy concerns, Wang’s May 13 order outlined that it was solely meant to preserve and segregate information that would not be provided “wholesale” to anyone — or stored “forever” — but used to address concerns raised in the suit.

If Wang is inclined to entertain the AI companies’ objection, the newspapers said she should give them a chance to analyze different populations of data and present findings to the court.

The suit alleges OpenAI has illegally harvested millions of news stories to train its large language models and build generative AI products that can vomit them out — or versions of them — to users. That has sometimes resulted in journalists’ pirated reporting being misstated or misrepresented, misinforming ChatGPT users, the newspapers have argued.

While the newspapers’ publishers have spent billions of dollars to send “real people to real places to report on real events in the real world,” the two tech firms are “purloining” the papers’ reporting without compensation “to create products that provide news and information plagiarized and stolen,” according to the lawsuit.

OpenAI has argued that the vast amount of data used to train its artificial intelligence bots is protected by “fair use” rules. The doctrine applies to rules that allow some to use copyrighted work for purposes like criticism, commentary, news reporting, teaching and research.

However, lawyers for the newspapers have argued that the fair use test involves transforming a copyrighted work into something new, and the new work cannot compete with the original in the same marketplace.

The judge has rejected OpenAI’s position that the newspapers haven’t produced “a shred of evidence” that people are using ChatGPT or OpenAI’s API products to get news instead of paying for it.

The newspapers noted Tuesday that engineers for the tech companies had all but admitted it themselves by acknowledging the chatbots weren’t designed to slip past paywalls — not that they couldn’t. They also cited another suit involving Google, in which an OpenAI engineer acknowledged local news was a “pretty common quer[y]” among ChatGPT users.

The Times originally brought the Manhattan Federal Court suit in December 2023.  The News, along with other newspapers in affiliated companies MediaNews Group and Tribune Publishing, filed in April 2024.

The other outlets included The Mercury News, The Denver Post, The Orange County Register and the St. Paul Pioneer Press, and Tribune Publishing’s Chicago Tribune, Orlando Sentinel and South Florida Sun Sentinel.

Lawyers for OpenAI did not respond to The News’ requests for comment.

Leave a Reply

Your email address will not be published. Required fields are marked *