undefined | Better HN

0 pointslogicchains2y ago0 comments

> it would be as if I would copy parts of other propriety code and copy paste it into my own codebase.

It's not copy-pasted; it's compressed in a lossy manner. Even GPT4 has nowhere near enough memory to store the entirety of its training data in a non-lossy compression format. Just likes how humans compress the information we read.

0 comments

8 comments · 5 top-level

jamiek882y ago· 2 in thread

If it’s lossy compressed how come they have verbatim content from NYT in there that’s easy to recall? That’s what the lawsuit is about.

anon2912y ago

Many humans have photographic memories. Not common, but not unheard of for people to be able to memorize long portions of text verbatim.

For example, the Wikipedia article

https://en.wikipedia.org/wiki/List_of_people_claimed_to_poss...

contains several examples of people who were able to look at pages and recite them back. That is actually a much stronger ability than GPT since GPT has presumably looked at them 100 times.

amelius2y ago

Yes, and a car is fast horse. Your argument does not tell us anything about whether or not GPT should be legal.

Laws are created by people (not by computers reasoning that all analogies must be true). And fairness is an important part of that process.

mihaic2y ago· 1 in thread

If you have a copyrighted photo that I simply put through jpeg compression, am I legally allowed to use that?

Software programs are not humans, and need to be treated differently. Anthropomorphization is one of the slipperiest paths to argue anything.

kromem2y ago

It depends on how much is reproducible and what the use is.

If only small patches of the original image can be reproduced then it becomes much more murky.

lacrimacida2y ago

>Just likes how humans compress the information we read.

Humans don’t have the scale machines have and moreover humans aren’t sevices, that argument doesn’t fly.

I really think NYTs data isn’t that important and nor crucial, LLMs could’ve just elided it. However, it’s more about training on copyrighted data in general which is kind of crucial for OpenAi, they trained their LLMs indiscriminately on copyrighted content without any plan to share any profits.

wouldbecouldbe2y ago

You're kind of proving my comment pretending they are akin to a human brain instead of an evolved form of statistics mixed with code, aka transformer model.

Let alone that it's a centralised model that's being distributed for a fee.

wouldbecouldbe2y ago

So if compress nytimes articles into a vector database and query it is a vector then that's okay in line with your reasoning?

j / k navigate · click thread line to collapse

0 comments

8 comments · 5 top-level

jamiek882y ago· 2 in thread

If it’s lossy compressed how come they have verbatim content from NYT in there that’s easy to recall? That’s what the lawsuit is about.

anon2912y ago

Many humans have photographic memories. Not common, but not unheard of for people to be able to memorize long portions of text verbatim.

For example, the Wikipedia article

https://en.wikipedia.org/wiki/List_of_people_claimed_to_poss...

contains several examples of people who were able to look at pages and recite them back. That is actually a much stronger ability than GPT since GPT has presumably looked at them 100 times.

amelius2y ago

Yes, and a car is fast horse. Your argument does not tell us anything about whether or not GPT should be legal.

Laws are created by people (not by computers reasoning that all analogies must be true). And fairness is an important part of that process.

mihaic2y ago· 1 in thread

If you have a copyrighted photo that I simply put through jpeg compression, am I legally allowed to use that?

Software programs are not humans, and need to be treated differently. Anthropomorphization is one of the slipperiest paths to argue anything.

kromem2y ago

It depends on how much is reproducible and what the use is.

If only small patches of the original image can be reproduced then it becomes much more murky.

lacrimacida2y ago

>Just likes how humans compress the information we read.

Humans don’t have the scale machines have and moreover humans aren’t sevices, that argument doesn’t fly.

wouldbecouldbe2y ago

You're kind of proving my comment pretending they are akin to a human brain instead of an evolved form of statistics mixed with code, aka transformer model.

Let alone that it's a centralised model that's being distributed for a fee.

wouldbecouldbe2y ago

So if compress nytimes articles into a vector database and query it is a vector then that's okay in line with your reasoning?

j / k navigate · click thread line to collapse