Sometimes they're so overfit that the compression isn't even lossy, and the data is encoded verbatim in the NN.
But I disagree with the underlying assumption that you can anthropomorphize LLMs. Gradient descent and backpropagation don't take place in the brain. LLMs "learn" in the same way that Excel sheets "learn".
Humans are living beings with needs and rights. A person being able to legally squat in a home doesn't mean that a drone occupying property for some amount of time also has squatter's rights, even though you could easily and affordably automate and scale the deployment of drones to live and hide away on properties long enough to attain rights regarding properties all over the country.
Backprop doesn't happen in us, but I think our neurones still do gradient descent – synapses that fire together, wire together.
And ultimately, at the deepest level we can analyse, our brains' atoms are doing quantum field diffusion equations, which you can also do in an Excel spreadsheet, so that kind of reductionism doesn't help either.
> Humans are living beings with needs and rights. A person being able to legally squat in a home doesn't mean that a drone occupying property for some amount of time also has squatter's rights, even though you could easily and affordably automate and scale the deployment of drones to live and hide away on properties long enough to attain rights regarding properties all over the country.
Yes, but we can also do tissue cultures and crude bioprinting, so it's a very foreseeable future where exactly the same argument will also be true for living organisms rather than digital minds.
We need to figure out what the deeper rules are that lead to the status quo, not merely mimic the superficial result. The latter is how cargo cults function.
Not exactly, no, but the 'neurons that fire together wire together' way of learning has a pretty similar effect.
> LLMs "learn" in the same way that Excel sheets "learn".
I've never seen an excel sheet do anything like backpropagation.
but, more importantly, OpenAI can also be sued for tortious interference? (basically the civil equivalent of accessory)
You misunderstood me. I was talking about something more fundamental.
Understanding is data compression. They are the same thing. Learning patterns, building mental models, creating abstractions, generalizing, gaining intuition/a feel for something - all the things humans engage in as part of learning and understanding the world - are all acts of lossy data compression.
most of the world disagrees with this view, and that means they will create the AI that wins.
I await the HN ban with fear..
[1] I'm not even doing referencing - so I am surely an LLM.
Humans have rights, software tools don’t.
If you grant an LLM the full set of human rights, then it can consume information, regurgitate copyrighted works, and use it to generate money for itself. However, considering blatantly obvious theft as “homage” goes hand in hand with free will, agency, being in control of yourself, not being enslaved and abused, etc. Pondering various scenarios along those lines really gets to the heart of why an LLM is so very much not a human, and how subjecting it to the same treatment as humans is a ridiculous notion.
If you don’t grant LLM human rights, then ClosedAI’s stance is basically that pirating works is OK because they pass them through a black box of if conditions and it leads to results that they can monetize. That’s such a solid argument, it’ll surely play well in the court of law.
Training data is not an “LLM does it”; first because “it” here is not “learning” or understanding in human sense (otherwise you would have to presume that an LLM is a human), and second because a software tool doesn’t have agency and it’s really just Microsoft using a tool based on copyrighted works to generate profit.
What I expect to happen is whoever has the most influence and power will get what they want and we'll end up raising a generation with the implicit understanding of "that's just how things are," natural order, truth, reality, and all that jazz.
The only thing that ever changes outcomes is if the contradiction status quo is incapable of being managed.
Our collective human limitations(physical, mental and temporal) are sort of invisible implicit rules that we all follow in one way or the other. If an entity is not bound by those rules then I don't see why that entity should be treated the same as a human.
Companies already make this differentiation.
For example take captcha and bot detection. Some of the heuristics are based on inherent human limitations like response time, click time, mouse acceleration etc.
I doubt youtube or any other streaming service will be happy if you want to stream all their videos to train a hypothetical human like AI(which views and prepares notes like a human) at a hugely accelerated speed compared to a regular human. You can guess how quickly they will cite fair usage policies.
What I want to say is there are fundamental differences between a human and an AI. So, we should not be quick to dismiss any concerns just because AI can "mimic" humans in certain areas.
Anyone got more details on this?
Superficially it sounds like total BS; a highly compressed zip file does not exhibit any characteristics of learning.
Algorithmically derived highly compressed video streams do not exhibit characteristics of learning.
?
I’ve vaguely heard the learning can be considered to exhibit the characteristics of compression in that understanding of content (eg. segmentation of video content resulting in more highly compressed videos) can lead to better compression schemes.
…but saying you can “do a with b” and “a and b are fundamentally the same thing” seems like a leap…?
It seems self evident you can have compression without comprehension.
An LLM has limited parameters. If an LLM had infinite parameters it could just memorize the results of every single addition question in existence and could not claim to have understood anything. Because it has finite parameters, if an LLM wants to get a lower loss on all addition questions, it needs to come up with a general algorithm to perform addition. Indeed, Neel Nanda trained a transformer to do addition mod 113 on relatively few examples, and it eventually learned some cursed Fourier transform mumbo jumbo to get 0 loss https://twitter.com/robertskmiles/status/1663534255249453056.
And the fact it has developed this "understanding" as an ability to learn a general pattern in the training data enables it to compress. I claim that the number of bits required to encode the general algorithm is fewer than the number of bits required to memorize every single example. If it weren't then the transformer would simply memorize every single example. But if it doesn't have space then it is forced to try to compress by developing a general model.
And the ability to compress enables you to construct a language model. Essentially, the more things compress, the higher the likelihood you assign them. Given a sequence of tokens say "the cat sat on the", we should expect "the cat sat on the mat" to compress into fewer bits than "the cat sat on the door". This is because the latter is far more common and intuitively more common sequences should compress more. You can then look at the number of bits used for every single choice of token following "the cat sat on the" and thus develop a probability distribution for the next token. The exact details of this I'm unclear on. https://www.hendrik-erz.de/post/why-gzip-just-beat-a-large-l... this gives a good summary.
I fundamentally disagree. That's not some established fact, just a narrative used by those who wish to plagiarize using "AI".
Here's an article from November 2023 that discusses this:
https://not-just-memorization.github.io/extracting-training-...