> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.
> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.
So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.
Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.
And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]
[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...
[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...
Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.
It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.
I am not an Open AI stan, but this needs to be responded to.
The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.
This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".
Anyone who makes promises about data security is at best incompetent and at worst dishonest.
Shouldn't that be "at best dishonest and at worst incompetent"?
I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?
I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.
Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.
If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?
> (jeapordizes)
... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.
In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.
We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.
Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?
OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.
Product development?
Right but the problem they're having is that the request is ignored.
https://openai.com/en-GB/policies/row-privacy-policy/
1. You can request it but there is no promise the request will be granted.
Defaults matter. Silicon Valley's defaults are not designed for privacy. They are designed for profit. OpenAI's default is retention. Outputs are saved by default.
It is difficult to take the arguments in their memo ISO objection to the preservation order seriously. OpenAI already preserves outputs by default.
What's the betting that they just write it on the website and never actually implemented it?
After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.
I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.
[1] https://ssdeep-project.github.io/ssdeep/index.html
[2] https://joshleeb.com/posts/content-defined-chunking.html
For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.
I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.
Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.
It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.
Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.
I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.
On the contrary.
>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.
I think you're being unduly paranoid. /s
https://www.theverge.com/2024/6/13/24178079/openai-board-pau...
https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...
Of course it's out of self-serving interests, but I find it hard to disagree with OpenAI on this one.
Third-party privacy and relevance is a constant point of contestion in discovery. Exhibit A: this article.
(1) With limited well scoped exclusions for lawyers, medical records, erc.
https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...
In other words, they want everyone to be forced to follow the same rules they were forced to follow 20 years ago.
I wonder if the laws and legal procedures are written considering this general assumption that a party to a lawsuit will naturally lie if it is in their interest. And then I read articles and comments about a "trust based society"...
OpenAI slams court order to save all ChatGPT logs, including deleted chats - https://news.ycombinator.com/item?id=44185913 - June 2025 (878 comments)
Imagine how much worse it is for your LLM chat history to leak.
It's even worse than your private comms with humans because it's a raw look at how you are when you think you're alone, untempered by social expectations.
Why would a customer expect this not to be private? How can one even know how it could be used against them, when they do t even know what’s being collected or gleaned from collected data?
I am following these issues closely, as I am terrified that my “assistant” will some day prevent me from obtaining employment, insurance, medical care etc. And I’m just a non law breaking normie.
A current day example would be TX state authorities using third party social/ad data to identify potentially pregnant women along with ALPR data purchased from a third party to identify any who attempt to have an out of state abortion, so they can be prosecuted. Whatever you think about that law, it is terrifying that a shift in it could find arbitrary digital signals being used against you in this way.
It's that it's like watching how someone might treat a slave when they think they're alone. And how you might talk down to or up to something that looks like another person. And how pathetic you might act when it's not doing what you want. And what level of questions you outsource to an LLM. And what things you refuse to do yourself. And how petty the tasks might be, like workshopping a stupid twitter comment before you post it. And how you copied that long text from your distraught girlfriend and asked it for some response ideas. etc. etc. etc.
At the very least, I'd wager that it reveals that bit of true helpless patheticness inherent in all of us that we try so hard to hide.
Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares.
To be fair the song was intense.
You make it sound like they're mad at you for no reason at all. How unreasonable of them when confronted with such honorable folks as yourselves!
The technology anarchists in this thread need perspective. This is fundamentally a case about the legality of this product. In the extreme case, this will render the whole product category of "llm trained on copyrighted content" illegal. In that case, you will have been part of a copyright infringement on a truly massive scale. The users of these tools do NOT deserve privacy in the light of the crimes alleged.
You do not get to claim to protect the privacy of the customers of your illegal venture.
Within "settings"? Is this referring to the dark pattern of providing users with a toggle "Improve model for everyone" that doesn't actually do anything? Instead users must submit a request manually on a hard to discover off-app portal, but this dark pattern has deceived them into think they don't need to look for it.
> OpenAI must process your request solely for the purpose of fulfilling it and not store your request or any responses it provides unless required under applicable laws. OpenAI also must not use your request to improve or train its models.
— https://www.apple.com/legal/privacy/data/en/chatgpt-extensio...
I wonder if we’ll end up seeing Apple dragged into this lawsuit. I’m sure after telling their users it’s private, they won’t be happy about everything getting logged, even if they do have that caveat in there about complying with laws.
The ZDR APIs are not and will not be logged. The linked page is clear about that.
It's just realism. Protect your private data yourself, relying on companies or governments to do it for you is like the saying goes, letting a tiger devour you up to the neck and then ask it to stop at the head
No you don't. You charge extra for privacy and list it as a feature on your enterprise plan. Not event paying pro customer get "privacy". Also, you refuse to delete personal data included in your models and training data following numerous data protection requests.
It says here:
> If you are on a ChatGPT Plus, ChatGPT Pro or ChatGPT Free plan on a personal workspace, data sharing is enabled for you by default, however, you can opt out of using the data for training.
Enterprise is just opt out by default...
https://help.openai.com/en/articles/8983130-what-if-i-want-t...
And whether and how they use your data for their own purposes isn't touched by that either.
> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.
That's a lot of words to say "yes, we are violating GDPR".
> Any judgment of a court or tribunal and any decision of an administrative authority of a third country requiring a controller or processor to transfer or disclose personal data may only be recognised or enforceable in any manner if based on an international agreement, such as a mutual legal assistance treaty, in force between the requesting third country and the Union or a Member State, without prejudice to other grounds for transfer pursuant to this Chapter.
So if, and only if, an agreement between the US and the EU allows it explicitly, it is legal. Otherwise it is not.
There's decades of legal disputes in some European countries on whether it's even legitimate for the government to mandate your ISP or phone company to collect metadata on you for after-the-fact law enforcement searches.
Looking at the actual data seems much more invasive than that and, in my (non-legally trained) estimate doesn't seem like it would stand a chance at least in higher courts.
Privacy mode (enforced across all seats)
OpenAI Zero-data-retention (approved)
Anthropic Zero-data-retention (approved)
Google Vertex AI Zero-data-retention (approved)
xAi Grok Zero-data-retention (approved)
did this just open another can of worms?
So nothing?
Do we know if the court order covers these?
im excited that the law is going to push for local models
> This does not impact API customers who are using Zero Data Retention endpoints under our ZDR amendment.
If you don’t retain that data you’re destroying evidence for the case.
It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).
And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem. I saw NYT’s filing and it had very compelling evidence that you could get ChatGPT to distribute verbatim copyrighted text from the Times without citation.
> It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).
Nobody other than both parties to the case, their lawyers, the court, and whatever case file storage system they use. In my view, that's already way too much given the amount and value of this data.
I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.
You're saying it's unreasonable to store data somewhere for a pending court case? Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information. That's ridiculous, if that was true then it would be impossible to perform discovery and get anything done in court.
Imagine a lawsuit against Signal that claimed some nefarious activity, harmful to the plaintiff, was occurring broadly in chats. The plaintiff can claim, like NYT, that it might be necessary to examine private chats in the future to make a determination about some aspect of the lawsuit, and the judge can then order Signal to find a way to retain all chats for potential review.
However you feel about OpenAI, this is not a good precedent for user privacy and security.
The court isn't saying "preserve this data forever and ever and compromise everyone's privacy," they're saying "preserve this data for the purposes of this court while we perform an investigation."
IMO, the NYT has a very good argument here that the only way to determine the scope of the copyright infringement is to analyze requests and responses made by every single customer. Like I said in my original comment, the remedies for copyright infringement are on a per-infringement basis. E.g., everytime someone on LimeWire downloads Song 2 by Blur from your PC, you've committed one instance of copyright infringement. My interpretation is that NYT wants the court to find out how many times customers have received ChatGPT responses that include verbatim New York Times content.
The whole premise of the lawsuit is that they didn't do anything unlawful, so saying "just do what the NYT wanted you to do" isn't interesting.
The NYT made an argument to a judge about what they think is going on and how they think the copyright infringement is taking place and harming them. In their filings and hearings they present the reasoning and evidence they have that leads them to believe that a violation is occurring. The court makes a judgment on whether or not to order OpenAI to preserve and disclose information relevant to the case to the court.
It's not "just do what NYT wanted you to do," it's "do what the court orders you to do based on a lawsuit brought by a plaintiff and argued to the court."
I suggest you read the court filing: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...
I'm confused, how does this not affect Enterprise or Edu? They clearly possess the data, so what makes them different legally?
> When we appeared before the Magistrate Judge on May 27, the Court clarified that ChatGPT Enterprise is excluded from preservation.
How much could the NYT back catalog be worth? Just buy it, ask the Saudis.
If a company is subject to a US court order that violates EU law, the company could face legal consequences in the EU for non-compliance with EU law.
The GDPR mandates specific consent and legal bases for processing data, including sharing it.
Assuming it is legal to share it for legal purposes one cant sufficiently anonymize the data. It needs to be accompanied by user data that allows requests to download it and for it to be deleted.
I wonder what the fine would be if they just delete it per user agreement.
I also wonder, could one, in the US, legally promise the customer they may delete their data then chose to keep it indefinitely and share it with others?
The ruling and situation aside, to what degree is it possible to enforce something like this and what are the penalties? Even in GDPR and other data protection cases, it seems super hard to enforce. Directives to keep or delete data basically require system level access, because the company can always CRUD their data whenever they want and whatever is in their best interest. Data can ask to be produced to a court periodically and audited which could maybe catch an individual case, I guess. There is basically no way to know without literally seizing the servers in an extreme case. Also, the consequences in most cases are a fine.
A.k.a. the cost of doing business.
Could you with a straight face argue that the NYT newspaper could be a surrogate girlfriend for you like a GPT can be? They maintain that it is obviously a transformative use and therefore not an infringement of copyright. You and I may disagree with this assertion, but you can see how they could see this as baseless, ridiculous, and frivolous when their livelihoods depend on that being the case.
Given that it's not explicitly mentioned as data not being affected, I'm assuming it is.
https://arstechnica.com/tech-policy/2025/06/openai-says-cour...
> We are taking steps to comply at this time because we must follow the law, but The New York Times’ demand does not align with our privacy standards. That is why we’re challenging it.
So basically no, lol. I wonder if we'll see the GDPR go head-to-head with Copyright Law here, that would be way more fun than OpenAI v NYT.
> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.
That's horse shit and OpenAI knows it. It means no such thing. A legal hold is just a 'preservation order'. It says absolutely nothing about other access or use.
The GDPR does not say that you can never be proven to have done something wrong in a court of law.
A legal hold requires no such thing and there would be no such requirement in it. They are perfectly free to access and use it for any reason.
So user privacy is definitely implicated.
They are being challenged because NYT believes that ChatGPT was trained with copyrighted data.
NYT naively push to find a way to prove that NYT data is being used in user chats and how often.
OpenAI spin that to NYT are invading user privacy.
It’s quite transparent as to what they are doing here.
The order the judge issued is irresponsible. Maybe ChatGPT did get too cute in its discovery responses, but the remedy isn’t to trample the rights of third parties.