undefined | Better HN

0 pointsdrvortex3y ago0 comments

Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

It is not directly using your code any more than programmers are using print statements. A book can be copyrighted, the vocabulary of language cannot. A particular program can be copyrighted, but snippets of it cannot, especially when they are used in a different context.

And that is why this lawsuit is dead on arrival.

0 comments

12 comments · 12 top-level

klabb33y ago

> Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

This is kinda smug, because it overcomplicates things for no reason, and only serves as a faux technocentric strawman. It just muddies the waters for a sane discussion of the topic, which people can participate in without a CS degree.

The AI models of today are very simple to explain: its a product built from code (already regulated, produced by the implementors) and source data (usually works that are protected by copyright and produced by other people). It would be a different product if it didn't have used the training data.

The fact that some outputs are similar enough to source data is circumstantial, and not important other than for small snippets. The elephant in the room is the act of using source data to produce the product, and whether the right to decide that lies with the (already copyright protected) creator or not. That's not something to dismiss.

2 more replies

xtracto3y ago

Say you publish a song and copyright it. Then I record it and save it in a .xz format. It's not an MP3, it is not an audio file. Say I split it into N several chunks and I share it with N different people. Or with the same people, but I share it at N different dates. Say I charge them $10 a month for doing that, and I don't pay you anything.

Am I violating your copyright? Are you entitled to do that?

To make it funnier: Say instead of the .xz, I "compress" it via π compression [1]. So what I share with you is a pair of π indices and data lengths for each of them, from which you can "reconstruct" the audio. Am I illegally violating your copyrights by sharing that?

[1] https://github.com/philipl/pifs

3 more replies

andrewmcwatters3y ago

This is demonstrably false. It is a system outputting character-for-character repository code.[1]

[1]: https://news.ycombinator.com/item?id=33457517

4 more replies

Cort3z3y ago

Just to be clear; I cannot prove that they have used my code, but for the sake of argument, lets assume so.

They would have directly used my code when they trained the thing. I see it as an equivalent of creating a zip-file. My code is not directly in the zip file either. Only by the act of un-zipping does it come back, which requires a sequence of math-steps.

1 more reply

heavyset_go3y ago

Neutral nets can and do encode and compress the information they're trained on, and can regurgitate it given the right inputs. It is very likely that someone's code is in that neural net, encoded/compressed/however you want to look at it, which Copilot doesn't have a license to distribute.

You can easily see this happen, the regurgitation of training data, in an over fitted neural net.

2 more replies

vkou3y ago

> It is not directly using your code any more than programmers are using print statements. A book can be copyrighted, the vocabulary of language cannot. A particular program can be copyrighted, but snippets of it cannot, especially when they are used in a different context.

So what? Why shouldn't we update the rules of copyright to catch up to advances in technology?

Prior to the invention of the printing press, we didn't have copyright law. Nobody could stop you from taking any book you liked, and paying a scribe to reproduce it, word for word, over and over again. You could then lend, gift, or sell those copies.

The printing press introduced nothing novel to this process! It simply increased the rate at which ink could be put to pages. And yet, in response to its invention, copyright law was created, that banned the most obvious and simple application of this new technology.

I think it's entirely reasonable for copyright law to be updated, to ban the most obvious and simple application of this new technology, both for generating images, and code.

civilized3y ago

> Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

Completely incorrect. False dichotomy. It's widely known that AI can and does memorize things just like humans do. Memorization isn't a defense to violating copyright, and calling memorization "adjusting a generative model" doesn't make it stop being memorization.

If you memorized Microsoft's code in your brain while working there and exfiltrated it, the fact that it passed through your brain wouldn't be a defense. Substituting "generative model" for "brain" and the fact that it's a tool used by third parties doesn't change this.

moralestapia3y ago

Whatever you say man :^)

https://twitter.com/docsparse/status/1581461734665367554

NicoleJO3y ago

You're wrong. See exposed code. https://justoutsourcing.blogspot.com/2022/03/gpts-plagiarism...

lamontcg3y ago

> but snippets of it cannot

Yeah they can, and the whole functions that Copilot spits out are quite obviously covered by copyright.

> especially when they are used in a different context.

That doesn't matter.

ouid3y ago

it is essentially a weighted sum of your code and other copyright holders code. Do not let the mystique of AI fool you. Copilot does not learn, it glues.

tevon3y ago

I agree.

If I read JRR Tolkien and then go and write a fantasy novel following a unexpected hero on his dangerous quest to undo evil, I haven't infringed, even if I use some of Tolkien's better turns of phrase.

2 more replies

j / k navigate · click thread line to collapse

0 comments

12 comments · 12 top-level

klabb33y ago

> Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

2 more replies

xtracto3y ago

Am I violating your copyright? Are you entitled to do that?

[1] https://github.com/philipl/pifs

3 more replies

andrewmcwatters3y ago

This is demonstrably false. It is a system outputting character-for-character repository code.[1]

[1]: https://news.ycombinator.com/item?id=33457517

4 more replies

Cort3z3y ago

Just to be clear; I cannot prove that they have used my code, but for the sake of argument, lets assume so.

1 more reply

heavyset_go3y ago

You can easily see this happen, the regurgitation of training data, in an over fitted neural net.

2 more replies

vkou3y ago

So what? Why shouldn't we update the rules of copyright to catch up to advances in technology?

I think it's entirely reasonable for copyright law to be updated, to ban the most obvious and simple application of this new technology, both for generating images, and code.

civilized3y ago

> Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

moralestapia3y ago

Whatever you say man :^)

https://twitter.com/docsparse/status/1581461734665367554

NicoleJO3y ago

You're wrong. See exposed code. https://justoutsourcing.blogspot.com/2022/03/gpts-plagiarism...

lamontcg3y ago

> but snippets of it cannot

Yeah they can, and the whole functions that Copilot spits out are quite obviously covered by copyright.

> especially when they are used in a different context.

That doesn't matter.

ouid3y ago

it is essentially a weighted sum of your code and other copyright holders code. Do not let the mystique of AI fool you. Copilot does not learn, it glues.

tevon3y ago

I agree.

2 more replies

j / k navigate · click thread line to collapse