undefined | Better HN

0 pointstensor2mo ago0 comments

I still find the idea that "learning" from code is "stealing" kind of ridiculous.

0 comments

28 comments · 10 top-level

boh2mo ago· 6 in thread

Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

The mental calisthenics required to justify this stuff must be exhausting.

idle_zealot2mo ago

> The mental calisthenics required to justify this stuff must be exhausting.

It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.

balamatom1mo ago

Correct.

Knowledge consists of, roughly speaking, thoughts.

(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)

The "thinking" part of a "thinking being" - that also consists of thoughts.

If your knowledges are someone's property, you are someone's property.

A society where all knowledge is proprietary, is a society of ubiquitous slavery.

Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.

Doesn't matter.

spankalee2mo ago

Humans have been known to recite entire parts from plays from memory, live in front of audiences even.

leni5361mo ago

And they are legally required to license the play to do that, if it's still in copyright.

1 more reply

Dylan168071mo ago

> Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

Are you finding people that actually say this?

When it can quote something like that, it's a training error. A popular enough work gets quoted and copied by people online, and then it's not properly deduplicated. It's a very small fraction of works it can do that with, and the cleaner your data the less it happens.

I'll once again quote that stable diffusion launched with fewer weights than training images. It had some accidental memorizations, but there wasn't room for its core functionality to be memorization-based.

Eisenstein1mo ago

This is a perfect example of 'begging the question'. Arriving at a conclusion from a fact assumed as true without evidence. Your reductio does not actually demonstrate that copyright applies to LLMs, because you did not demonstrate how transcoding is comparable to inference, just that LLMs can reproduce some passages from copyrighted works. You could also produce passages from copyrighted works by generating enough random sequences of words, but no one is arguing that is comparable to transcoding. That the people who do not share this conclusion are engaging in motivated reasoning is based only on your assumption and has no logical backing, and is therefore begging the question.

estimator72922mo ago· 5 in thread

Learning, probably not.

Copy/pasting at scale, yes

vorticalbox2mo ago

It is learning though. It’s not just copying the code.

Code gets turned into tokens and then it learns the next most likely token.

The issue that I see most people talk about it the scale at which is learnt.

A human will learn from other people’s code but not from every persons code.

cogman102mo ago

The issue is that of copyright law WRT to derivative works. Machine transformations on original works does not create a new copyright for the person that directed the machine transformation. That's why you can't pirate a bunch of media by simply adding a red pixel to the righthand corner or by color shifting the video.

Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.

It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."

I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.

1 more reply

ell1e1mo ago

LLMs seem to be so devoid of intelligence, I think it's arguable if that's learning: https://machinelearning.apple.com/research/illusion-of-think... Typically, you would imply a level of understanding when you say learning. LLMs apparently can't do that, by design.

blks2mo ago

A human is not a commercial product. Here we have commercial product that was created by using a lot of various copyrighted and protected IP, without licensing agreements, without paying, without even citing it.

margalabargala2mo ago

Copy/pasting at scale is how tons of software has been written for a long time, or have we all forgotten the jokes people used to make about StackOverflow?

array_key_first2mo ago· 4 in thread

The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.

It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.

tensorOP1mo ago

It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.

leereeves1mo ago

It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2. Any law restricting that would be the worst form of tyranny.

It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)

1 more reply

aeon_ai2mo ago

If one defines 'flying' to be a bird's endeavor, then humans can't fly.

Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.

greendestiny1mo ago

Sure as a word it can be broad, as a concept in our legal system that should be much more nuanced.

The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?

2 more replies

greendestiny1mo ago· 2 in thread

I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".

alok-g1mo ago

>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.

greendestiny1mo ago

I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.

Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.

1 more reply

pydry2mo ago· 1 in thread

If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".

I have seen LLMs do all sorts of crap which was clearly reproduction of training material.

This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.

jakeydus2mo ago

Remember last year (?) when one of the major AIs produced a bit of code that included Jeff Geerling's name in a comment?

pessimizer1mo ago

"Learning" for LLMs is just as goofy and propagandistic a metaphor as "stealing" for copyright. I find it predictive of your position that you'll accept one dumb metaphor for something that we didn't need a metaphor for, but not the other.

Are you for stealing and against learning?

We know exactly what is happening in both cases. We can talk about that, or we can use obfuscating euphemisms that make our preferred position seem obviously true.

nkrisc2mo ago

I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.

charonn01mo ago

Is "learning" the correct term?

Or is it "plagiarism"?

lo_zamoyski2mo ago

If there were the case, then imagine having to give it back!

MagicMoonlight2mo ago

If I “learned” your essay and handed it in, would you be happy with that?

j / k navigate · click thread line to collapse

0 comments

28 comments · 10 top-level

boh2mo ago· 6 in thread

The mental calisthenics required to justify this stuff must be exhausting.

idle_zealot2mo ago

> The mental calisthenics required to justify this stuff must be exhausting.

balamatom1mo ago

Correct.

Knowledge consists of, roughly speaking, thoughts.

(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)

The "thinking" part of a "thinking being" - that also consists of thoughts.

If your knowledges are someone's property, you are someone's property.

A society where all knowledge is proprietary, is a society of ubiquitous slavery.

Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.

Doesn't matter.

spankalee2mo ago

Humans have been known to recite entire parts from plays from memory, live in front of audiences even.

leni5361mo ago

And they are legally required to license the play to do that, if it's still in copyright.

1 more reply

Dylan168071mo ago

> Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

Are you finding people that actually say this?

Eisenstein1mo ago

estimator72922mo ago· 5 in thread

Learning, probably not.

Copy/pasting at scale, yes

vorticalbox2mo ago

It is learning though. It’s not just copying the code.

Code gets turned into tokens and then it learns the next most likely token.

The issue that I see most people talk about it the scale at which is learnt.

A human will learn from other people’s code but not from every persons code.

cogman102mo ago

It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."

1 more reply

ell1e1mo ago

blks2mo ago

margalabargala2mo ago

Copy/pasting at scale is how tons of software has been written for a long time, or have we all forgotten the jokes people used to make about StackOverflow?

array_key_first2mo ago· 4 in thread

The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.

tensorOP1mo ago

It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.

leereeves1mo ago

It's legal and perfectly reasonable for a human being to combine organic fuels with oxygen from the air to create energy and CO2. Any law restricting that would be the worst form of tyranny.

It would not be reasonable to allow machines to do that at unlimited scale without restrictions.

(Hopefully the fossil fuels industry won't draw inspiration from the legal arguments made by AI companies...)

1 more reply

aeon_ai2mo ago

If one defines 'flying' to be a bird's endeavor, then humans can't fly.

Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.

greendestiny1mo ago

Sure as a word it can be broad, as a concept in our legal system that should be much more nuanced.

The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?

2 more replies

greendestiny1mo ago· 2 in thread

I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".

alok-g1mo ago

>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.

greendestiny1mo ago

I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.

Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.

1 more reply

pydry2mo ago· 1 in thread

If you can set a copyright trap and an LLM reproduces it I think it's pretty clear cut that it's more than just "learning".

I have seen LLMs do all sorts of crap which was clearly reproduction of training material.

This is also why people are most impressed with how much better it is at reproducing boilerplate rather than, say, imaginative new ideas.

jakeydus2mo ago

Remember last year (?) when one of the major AIs produced a bit of code that included Jeff Geerling's name in a comment?

pessimizer1mo ago

Are you for stealing and against learning?

We know exactly what is happening in both cases. We can talk about that, or we can use obfuscating euphemisms that make our preferred position seem obviously true.

nkrisc2mo ago

I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.

charonn01mo ago

Is "learning" the correct term?

Or is it "plagiarism"?

lo_zamoyski2mo ago

If there were the case, then imagine having to give it back!

MagicMoonlight2mo ago

If I “learned” your essay and handed it in, would you be happy with that?

j / k navigate · click thread line to collapse