undefined | Better HN

story

0 pointsmiki1232111y ago0 comments

> This is obviously extremely silly, because that's exactly how OpenAI got all of its training data

IANAL, but It is worth noting here that DeepSeek has explicitly consented to a license that doesn't allow them to do this. That is a condition of using the Chat GPT and the OpenAI API.

Even if the courts affirm that there's a fair use defence for AI training, DeepSeek may still be in the wrong here, not because of copyright infringement, but because of a breach of contract.

I don't think OpenAI would have much of a problem if you train your model on data scraped from the internet, some of which incidentally ends up being generated by Chat GPT.

Compare this to training AI models on Kindle Books randomly scraped off the internet, versus making a Kindle account, agreeing to the Kindle ToS, buying some books, breaking Amazon's DRM and then training your AI on that. What DeepSeek did is more analogous to the latter than the former.

0 comments

anon3738391y ago

> DeepSeek has explicitly consented to a license that doesn't allow them to do this.

You actually don’t know this. Even if it were true that they used OpenAI outputs (and I’m very doubtful) it’s not necessary to sign an agreement with OpenAI to get API outputs. You simply acquire them from an intermediary, so that you have no contractual relationship with OpenAI to begin with.

shishy1y ago

I figured those contracts with an intermediary would extend to anyone they re-sell to, or prohibit them from re-selling...

fdsjgfklsfd1y ago

You are free to publish your conversations with ChatGPT on the Internet, where they can be picked up by scrapers. US ruled that they are not covered by copyright...

krust1y ago

>IANAL, but It is worth noting here that DeepSeek has explicitly consented to a license that doesn't allow them to do this. That is a condition of using the Chat GPT and the OpenAI API.

I have some news for you

dmitrygr1y ago

> DeepSeek has explicitly consented to a license that doesn't allow them to do this.

By existing in USA, OpenAI consented to comply with copyright law, and how did that go?

1 more reply

blibble1y ago

training is either fair use, or it isn't

OpenAI can't have it both ways

chefandy1y ago

Right, but it was never about doing the right thing for humanity, it was about doing the right thing for their profits.

Like I’ve said time and time again, nobody in this space gives a fuck about anyone that isn’t directly contributing money to their bottom line at that particular instant. The fundamental idea is selfish, damages the fundamental machinery that makes the internet useful by penalizing people that actually make things, and will never, ever do anything for the greater good if it even stands a chance of reducing their standing in this ridiculously overhyped market. Giving people free access to what is for all intents and purposes a black box is not “open” anything, is no more free (as in speech) than Slack is, and all of this is obviously them selling a product at a huge loss to put competing media out of business and grab market share.

miki123211OP1y ago

The issue here is breach of contract, not copyright.

glooglork1y ago

It's quite unlikely that OpenAI didn't break any TOS with all the data they used for training their models. Not just OpenAI but all companies that are developing LLMs.

IMO, it would look bad for OpenAI to push strongly with this story, it would look like they're losing the technological edge and are now looking for other ways to make sure they remain on top.

boppo11y ago

Interesting that Trump signalled positively for deepseek. Said something like 'american companies need to wake up'. Has Sam not paid the piper yet?

staticman21y ago

Similar to how a patent contract becomes void when a patent expires regardless of what the terms of the contract says, it's not clear to me OpenAI can enforce a contract provision for an API output they own no copyright in.

Since they have no intellectual property rights in the output, it's not clear to me they have a cause of action to sue over how the output is used.

I wonder if any lawyers have written about this topic.

prmoustache1y ago

What makes you think they had a contract with them in the first place? You can use openAI through intermediaries/proxies.

WolfRazu1y ago

I assume all those intermediaries have to pass on the same ToS to their customers otherwise that seems like a very unusual move.

fdsjgfklsfd1y ago

How many thousands or millions of contracts has OpenAI breached by scraping data off of websites that have terms of service explicitly saying not to scrape data off their websites?

avs7331y ago

They can sure try though, and I would be damned surprise if this wasn’t related to Sam’s event with trump last week.

windexh8er1y ago

"Free for me, not for thee!" - Sam Altman /s

But in all reality I'm happy to see this day. The fact that OpenAI ripped off everyone and everything they could and, to this day pretend like they didn't, is fantastic.

Sam Altman is a con and it's not surprising that given all the positive press DeepSeek got that it was a full court assault on them within 48 hours.

freen1y ago

Did OpenAI abide by my service’s terms of service when it ingested my data?

cortesoft1y ago

Did OpenAI have to sign up for your service to gain access?

lolinder1y ago

It probably ignored hundreds of thousands of "by using this site you consent to our Terms and Conditions" notices, many of which probably would be read as prohibiting training. But that's also a great example of why these implicit contracts don't really work as contracts.

otherme1231y ago

OpenAI scrapped my blog so aggressively that I had to ban their IPs. They ignored the robots.txt (which is kind of ToS) by 2 orders of magnitude, they ignored the explicit ToS that I copypasted blindly from somewhere but turns out it forbids what they did (something like you can't make money with the content). Not that I'm going to enforce it, but they should at least shut up.

freen1y ago

Civil law is only available to deep pockets.

Contracts are enforceable to the degree to which you can pay lawyers to enforce them.

I will run out of money trying to enforce my terms of service against openAI, while they have a massive war chest to enforce theirs.

Ain’t libertarianism great?

1 more reply

bayindirh1y ago

No, but some of the data is licensed.

For example, my digital garden is under GFDL, and my blog is CC BY-NC-SA. IOW, They can't remix my digital garden with any other license than GFDL, and they have to credit me if they remix my blog, and can't use it for any commercial endeavor, which OpenAI certainly does now.

So, by scraping my webpages, they agree to my licensing of my data. So they're de-facto breaching my licenses, but they cry "fair-use".

If I tell that they're breaching the license terms, they'd laugh at me, and maybe give me 2 cents of API access to mock me further. When somebody allegedly uses their API with their unenforcable ToS, they scream like an agitated cuckatoo (which is an insult to the cuckatoo, BTW. They're devilishly intelligent birds).

Drinking their own poison was mildly painful, I guess...

BTW, I don't believe that Deepseek has copied/used OpenAI models' outputs or training data to train theirs, even if they did, "the cat is out of the bag", "they did something amazing so they needed no permissions", "they moved fast and broke things", and "all is fair-use because it's just research" regardless of how they did it.

Heh.

Ukv1y ago

> So, by scraping my webpages, they agree to my licensing of my data.

If the fair use defense holds up, they didn't need a license to scrape your webpage. A contract should still apply if you only showed your content to people who've agreed to it.

> and "all is fair-use because it's just research"

Fair use is a defense to copyright infringement, not breach of contract. You can use contracts, like NDAs, to protect even non-copyright-eligible information.

Morally I'd prefer what DeepSeek allegedly did to be legal, but to my understanding there is a good chance that OpenAI is found legally in the right on both sides.

1 more reply

addicted1y ago

They probably did to access the NYTimes articles.

outside12341y ago

That isn't required to be in violation of copyright

freen1y ago

Actually, yes, they actively agreed to them. Clicked the button and everything.

baq1y ago

Have their scraping bots consented to cookies?

thorncorona1y ago

Can you steal someone else’s laptop if they stood up to get a drink?

addicted1y ago

OpenAI itself has argued, to the degree that your analogy applies, that if the goal of stealing the laptop is to train AI then the answer is Yes.

cortesoft1y ago

Wouldn't this analogy be more like, "can you read my laptop screen if I stood up to get a drink?"

1 more reply

gizajob1y ago

If their OS is open to the internet and you can scrape it and copy it off while they’re gone, then that would be about the right analogy. And OpenAi and DeepSeek have done the same thing in that case.

secstate1y ago

Yes, if you can pay off any witnesses.

rpastuszak1y ago

What?

dartos1y ago

TOS are not contracts.

lolinder1y ago

Citation? My understanding was that they are provided that someone has to affirmatively accept them in order to use your site. So Terms of Service stuck at the bottom in the footer likely would not count as a contract because there's no consent, but Terms of Service included in a check box on a login form likely would count.

But IANAL, so if you have a citation that says otherwise I'd be happy to see it!

addicted1y ago

You don’t need a citation.

You just need to read OpenAI’s arguments about why TOS and copyright laws don’t apply to them when they’re training on other people’s copyrighted and TOS protected data and running roughshod over every legal protection.

xdennis1y ago

IANAGL, but in Germany a ToS is not a contract and can be declared void if it's deemed by courts to be unfair.

vanviegen1y ago

Yes, though this is especially true when it's consumers 'agreeing' to the TOS. Anything even somewhat surprising within such a TOS is basically thrown out the window in European courtrooms without a second look.

For actual, legally binding consent, you'll need to make some real effort to make sure the consumer understands what they are agreeing to.

Spooky231y ago

People here will argue that. But the Chinese DNGAF.

like_any_other1y ago

Legally, I understand your point, but morally, I find it repellent that a breach of contract (especially terms-of-service) could be considered more important than a breach of law. Especially since simply existing in modern society requires us to "agree" to dozens of such "contracts" daily.

I hope voters and governments put a long-overdue stop to this cancer of contract-maximalism that has given us such benefits as mandatory arbitration, anti-benchmarking, general circumvention of consumer rights, or, in this case, blatantly anti-competitive terms, by effectively banning reverse-engineering (i.e. examining how something works, i.e. mandating that we live in ignorance).

Because if they don't, laws will slowly become irrelevant, and our lives governed by one-sided contracts.

anothernewdude1y ago

It's not hard to get someone else to submit queries and post the results, without agreeing to the license.

j / k navigate · click thread line to collapse

0 comments

anon3738391y ago

> DeepSeek has explicitly consented to a license that doesn't allow them to do this.

shishy1y ago

I figured those contracts with an intermediary would extend to anyone they re-sell to, or prohibit them from re-selling...

fdsjgfklsfd1y ago

You are free to publish your conversations with ChatGPT on the Internet, where they can be picked up by scrapers. US ruled that they are not covered by copyright...

krust1y ago

>IANAL, but It is worth noting here that DeepSeek has explicitly consented to a license that doesn't allow them to do this. That is a condition of using the Chat GPT and the OpenAI API.

I have some news for you

dmitrygr1y ago

> DeepSeek has explicitly consented to a license that doesn't allow them to do this.

By existing in USA, OpenAI consented to comply with copyright law, and how did that go?

1 more reply

blibble1y ago

training is either fair use, or it isn't

OpenAI can't have it both ways

chefandy1y ago

Right, but it was never about doing the right thing for humanity, it was about doing the right thing for their profits.

miki123211OP1y ago

The issue here is breach of contract, not copyright.

glooglork1y ago

It's quite unlikely that OpenAI didn't break any TOS with all the data they used for training their models. Not just OpenAI but all companies that are developing LLMs.

IMO, it would look bad for OpenAI to push strongly with this story, it would look like they're losing the technological edge and are now looking for other ways to make sure they remain on top.

boppo11y ago

Interesting that Trump signalled positively for deepseek. Said something like 'american companies need to wake up'. Has Sam not paid the piper yet?

staticman21y ago

Since they have no intellectual property rights in the output, it's not clear to me they have a cause of action to sue over how the output is used.

I wonder if any lawyers have written about this topic.

prmoustache1y ago

What makes you think they had a contract with them in the first place? You can use openAI through intermediaries/proxies.

WolfRazu1y ago

I assume all those intermediaries have to pass on the same ToS to their customers otherwise that seems like a very unusual move.

fdsjgfklsfd1y ago

How many thousands or millions of contracts has OpenAI breached by scraping data off of websites that have terms of service explicitly saying not to scrape data off their websites?

avs7331y ago

They can sure try though, and I would be damned surprise if this wasn’t related to Sam’s event with trump last week.

windexh8er1y ago

"Free for me, not for thee!" - Sam Altman /s

But in all reality I'm happy to see this day. The fact that OpenAI ripped off everyone and everything they could and, to this day pretend like they didn't, is fantastic.

Sam Altman is a con and it's not surprising that given all the positive press DeepSeek got that it was a full court assault on them within 48 hours.

freen1y ago

Did OpenAI abide by my service’s terms of service when it ingested my data?

cortesoft1y ago

Did OpenAI have to sign up for your service to gain access?

lolinder1y ago

otherme1231y ago

freen1y ago

Civil law is only available to deep pockets.

Contracts are enforceable to the degree to which you can pay lawyers to enforce them.

I will run out of money trying to enforce my terms of service against openAI, while they have a massive war chest to enforce theirs.

Ain’t libertarianism great?

1 more reply

bayindirh1y ago

No, but some of the data is licensed.

So, by scraping my webpages, they agree to my licensing of my data. So they're de-facto breaching my licenses, but they cry "fair-use".

Drinking their own poison was mildly painful, I guess...

Heh.

Ukv1y ago

> So, by scraping my webpages, they agree to my licensing of my data.

If the fair use defense holds up, they didn't need a license to scrape your webpage. A contract should still apply if you only showed your content to people who've agreed to it.

> and "all is fair-use because it's just research"

Fair use is a defense to copyright infringement, not breach of contract. You can use contracts, like NDAs, to protect even non-copyright-eligible information.

Morally I'd prefer what DeepSeek allegedly did to be legal, but to my understanding there is a good chance that OpenAI is found legally in the right on both sides.

1 more reply

addicted1y ago

They probably did to access the NYTimes articles.

outside12341y ago

That isn't required to be in violation of copyright

freen1y ago

Actually, yes, they actively agreed to them. Clicked the button and everything.

baq1y ago

Have their scraping bots consented to cookies?

thorncorona1y ago

Can you steal someone else’s laptop if they stood up to get a drink?

addicted1y ago

OpenAI itself has argued, to the degree that your analogy applies, that if the goal of stealing the laptop is to train AI then the answer is Yes.

cortesoft1y ago

Wouldn't this analogy be more like, "can you read my laptop screen if I stood up to get a drink?"

1 more reply

gizajob1y ago

secstate1y ago

Yes, if you can pay off any witnesses.

rpastuszak1y ago

What?

dartos1y ago

TOS are not contracts.

lolinder1y ago

But IANAL, so if you have a citation that says otherwise I'd be happy to see it!

addicted1y ago

You don’t need a citation.

xdennis1y ago

IANAGL, but in Germany a ToS is not a contract and can be declared void if it's deemed by courts to be unfair.

vanviegen1y ago

For actual, legally binding consent, you'll need to make some real effort to make sure the consumer understands what they are agreeing to.

Spooky231y ago

People here will argue that. But the Chinese DNGAF.

like_any_other1y ago

Because if they don't, laws will slowly become irrelevant, and our lives governed by one-sided contracts.

anothernewdude1y ago

It's not hard to get someone else to submit queries and post the results, without agreeing to the license.

j / k navigate · click thread line to collapse