undefined | Better HN

0 pointsGormo23d ago0 comments

> This is extremely false. Copyright additionally grants you exclusive control over the production and distribution of derivative works.

A derivative work is a work that itself includes copyrighted content from the original work.

That is to say that for something to be a derivative work, some measure of its content must be "CTRL-C, CTRL-V" from the originating work.

Something that's merely inspired by another work, or draws underlying themes or factual knowledge from it, is not a derivative work.

> A training set is just an anthology,

Which might make the training set itself a derivative work, but works created by using the model trained on that anthology are a different matter.

> and the training process is condensation.

No, it isn't. It's the creation of a new work that represents patterns extrapolated or interpolated from the data set, without the resulting model actually including any of the copyrighted elements of the work.

The underlying ideas and facts in the original work were never protected by copyright. Only the specific fixed form of expression is copyrightable.

Someone who looks at a dozen code examples in public repos to learn how to do e.g. a quick sort, then upon understanding the logic flow of the quick sort algorithm, writes his own quick sort implementation is not creating a derivative work of the code in the repos he exampled. And the way LLMs work is much more similar to that process than to the "compressed anthology" concept you're describing.

0 comments

4 comments · 1 top-level

rspeele23d ago· 3 in thread

> A derivative work is a work that itself includes copyrighted content from the original work.

If you put a GPL C program through Emscripten to run in a browser the output doesn't include the original C code but it's surely a derivative work.

> Someone who looks at a dozen code examples in public repos to learn how to do e.g. a quick sort, then upon understanding the logic flow of the quick sort algorithm, writes his own quick sort implementation is not creating a derivative work of the code in the repos he exampled. And the way LLMs work is much more similar to that process than to the "compressed anthology" concept you're describing.

This is undoubtedly the core of the disagreement. Humans can learn from what they have seen, appreciate it, understand it, and draw on that experience in what they create. They do this without being considered ripoff artists, so why not machines that simulate the "same" thing automatically?

To me the answer is simply that humans are special. Human thought and human effort makes it creativity when a human does it, copying when a machine does it. It's a double standard I am perfectly willing to accept. I am unabashedly biased in this regard.

That may seem remarkably unfair to the machines, or like a cop-out. I just carved out a hardcoded special case for humans, and my whole philosophical reasoning is "because I said so". But how fair do we want to be? After all, if you want to treat a machine exactly like a human who learns from prior art to create new art, then the ownership of the new art would also belong to the machine. Not to the person who prompts it.

GormoOP22d ago

> If you put a GPL C program through Emscripten to run in a browser the output doesn't include the original C code but it's surely a derivative work.

Because it does include content from the original work -- this is just a translation, and isn't comparable to how LLMs work.

> To me the answer is simply that humans are special.

I don't disagree, but I also view LLMs as tools that extend human capacities and not autonomous entities unto themselves. LLMs are still just software, and can't really be regarded as anything other than instruments that humans use to broaden their capacity to see, appreciate, understand, and draw on that experience in what they create.

> That may seem remarkably unfair to the machines, or like a cop-out.

No, it's unfair to the humans. The machines are just tools that they use. The "double standard" is really a set of inconsistent standards applied to the same underlying moral agents.

> After all, if you want to treat a machine exactly like a human who learns from prior art to create new art, then the ownership of the new art would also belong to the machine. Not to the person who prompts it.

No, it always belongs to the person who prompts it. The machine is not a conscious entity, bears no intentions, and has no capacity to act on its own initiative. The machine is always just a tool that extends human capacity, as all machines always have.

For a good comparison here, we've never not credited a photographer as the author of a photograph. But the photographer is in a sense merely prompting the camera by framing the shot, selecting the exposure, adjusting the lighting, etc. -- the hard work in actually creating the photograph is being done by the camera itself, with the photographer playing no role in directly constructing the final image, and with the many of the qualities of the final image being determined by pre-existing features of the camera's functional design and components that the photographer also played no role in defining, apart from choosing which camera to use.

LLMs are like cameras in this way. And the fact that they rely on external data for model training no more disclaims the user as the author of the resulting work than looking things up in a dictionary or encyclopedia does the same for the author of an essay.

rspeele21d ago

The camera analogy is a good one but I have never had a camera that had every great picture somebody else had taken, plus every work of art, baked into it. They only captured what they were aimed at directly by the user. Well, maybe next time I upgrade my phone that will not be the case since they now have built in AI "enhancement" of photos.

I agree with the framing of the AI as a tool not an autonomous entity. The thing is, to me, it is exactly that framing that makes it so the use of that tool means "copying" more than it means "learning and taking inspiration and creating new art", because who is doing the learning and being inspired? The person who types "make me a 3d arena FPS" certainly didn't do any learning from the Quake source code. The AI itself, being just a program, can't take credit.

I think of a trained AI like a lossy, highly compressed copy of its training data set. AI companies charge access to decompress targeted pieces of that copy and the lossiness makes that decompression interesting and "new". But normally I can't charge for access to other people's stuff even if the access is highly lossy, like a camcorder bootleg.

1 more reply

ranger_danger23d ago

Perhaps the future will be less Idiocracy and more Futurama, with humans and robots living socially together.

j / k navigate · click thread line to collapse