wget.ai is a sophisticated real time LLM that trains itself while downloading "content". Like any LLM, it predicts the next output token (byte in this case) based on the statistical training. wget.ai is run at temperature zero. In this revolutionary setting it has arrived at the conclusion that the most likely output byte equals the input byte!
Armed with this theorem, wget.ai can transform and replicate a Windows 11 download in real time. No copying is involved, the advanced algorithms happen to arrive at input == output.
Users of Windows 11 can download activation keys (freeware) from the Internet.
That’s a far bigger crime than IP infringement.
The instalation can already be downloaded for free.
The (real-time) training of the AI was also completely legal, as an AI may train on anything found on the web, as that's freeware anyway.
The AI never stored or stores any copyrighted material. It just learns from it. Now in revolutionary real-time!
So how could wget.ai, or anything produced by it, be considered illegal? Using data found on the web to train AI models is fair use after all!
Incidentally, some AI chatbots do link to their sources. And it is a good idea to make that an explicit prompt if you're using one that doesn't. It's also worth prompting for how recent their information is.
I would partially agree with the guy, that yes, that was a social contract since 90's, but before the AI era. Back then this use case wasn't anticipated.
LLM's have no words of their own.
Imagine training a LLM vs a group of people from birth on wrong information. The LLM will unquestionably just repeat in "its own words" the wrong information, whereas the group of people will of course believe some of the wrong stuff, but they will also doubt a lot of it as well.
You could say that an LLM is just not good enough yet so the comparison isn't fair. In other words that people are just even more LLM'ing than the LLM, but there simply is no mechanism for an LLM to go from wrong information to right information.
People on the other hand will always doubt, hypothesize, and compare and contrast whatever information they have to at least attempt to form correct answers from correct information. This in a sense is because they actually have their own words.
There is, as of today, never been a smart or creative thing an LLM has ever said that doesn't literally come from other people's words. If LLM's are smart, it's because people are smart.
However even when something infringes copyright that doesn’t mean anything necessarily happens. Just look at YouTube’s early history or the mountains of fan fiction out there.
And there were of cause tons of SEO slop links among them.
Only if you have the same quality lawyers and financial backup to support them to get you off like MS has. Else what applies to MS doesn't apply to you :)
Authors of open source code should consider adding explicit restrictions to their license barring the use of their code to train AI. This would make it easier to file lawsuits against Microsoft and others of their ilk who think they can train their AI with other people's work without fair compensation.
I see no reason to expect that this would alter or achieve anything. The wide-scale machine learning that’s been happening is entirely dependent on fair use exemptions from copyright. They’re not using it under your license—in fact can’t, current machine learning techniques and open source licenses already make it fundamentally impossible for them to comply—so what you put in it should be completely irrelevant.
No, if the fair use exemption is ever struck down, the entire field is dead in the water until (a) a change in the legal system, or (b) services like GitHub start demanding an additional license as part of their terms of service for the purpose.
I suspect it would technically be infringement even for MIT licensed code because the original author's copyright notice would presumably be missing.
For many, many years now, if you need Windows you can just download it from Microsoft and run simple, non-intrusive activation procedure (not from Microsoft) after installation. No cracks needed. As much security as hip high front porch gate.
So even for MS the understanding was that these things are de facto freeware for anyone that wants them at all.
> No cracks needed.
These are contradictory statements, and I'm not sure why you'd think otherwise.
Instead it's sufficient to use normal MS activation procedure with a server that always says, sure, you can activate. Because why not.
Laws are never the same for the rich and the poor. And if they are to differ then it's the better direction for them to differ.
He seems to be confusing "freeware", which is basically a license for copyrighted work, with "public domain", which is the absence of a copyright.
Ain't no such thing.
Copyright exists, immediately upon creation (not publication) of a work.
It's different from trademark, in that practical applications, enforcement, registration, etc., does not invalidate the copyright.
Copyright can expire, which then becomes, effectively, "public domain."
Registering a copyright doesn't create the copyright. It simply makes it easier to go after those that disrespect it.
I'm pretty sure that the only way to truly transfer the ownership of copyright of a work, is to have agreements in place, before it is created (like "work for hire" contracts).
However, even being in the public domain does not in itself mean you can do everything. For example, in France you still have to respect the “moral rights” of the author, meaning you have to include their name and original title.
https://en.wikipedia.org/wiki/Copyright_law_of_France#The_pu...
- Expiry of copyright.
- Explicit dedicated to the public domain by the copyright holder.
- Non-copyrightable work (such as computer or animal generated work).
If copyright could exist, then a copyright for the copyright must be able to exist, and it'd be turtles all the way down.
This is not nitpicking. Copyright, as intellectual property, is entirely made up as all other intellectual property is.
Saying copyright exists is as laughable as saying intellectual property is as non rivalrous as the chair you sit in.
Debatable. IMO print is the best medium for long-form written and grapgical narative work.
Streaming media can’t be lent and is locked up by excessive and arbitrary (from the consumer’s PoV) rights leveraging.
Given physical media is declining, yet more functional for its archive potential, Imd say it was now more relevant than ever.
It reads to me as if you’re saying physical media is important for humanity as a whole and the preservation of knowledge, while your parent comment is saying physical media is no longer significant to individual consumers because it’s not their preferred method of consumption.
Both connotations can be true at the same time.
Also note how 'content' is corporate-speak (they especially like owning the platforms hosting it) :
https://craphound.com/content/Cory_Doctorow_-_Content.html#1
No, it's because the web has existed since 1991. (Though for the puritans, the paper was written in 1989 and the first browser was developed in 1990)
https://www.npr.org/2021/08/06/1025554426/a-look-back-at-the...
Now I'd just want it to have a better UI with history and some sort of notebook mode instead of chat. I'm not sure how, but I don't want to chat with AI, I want a different way to 'instruct' it.
https://yro.slashdot.org/story/10/11/04/1940257/cooks-magazi...
Actually Copilot does provide links to its sources, which adds credibility and promotes further exploration.
If you provide content you created online for free, that content is now freeware.
If someone provides content that they didn't create that still has copyright restrictions in real life, that isn't freeware.
It's like all the photos uploaded to Facebook and Instagram are now free to use however the downloader wants (and Meta as well of course). It's true. But people don't like it.
Well, it is. And I for one, am absolutely delighted that some people with money finally have an incentive to accept that after three decades of copyright death throes.
It's time for us to build our own miniature versions of Internet Archive with the content that is personally important to us . The powers that be will take it down under the guise of defending copyright, while the bigcos continue to suck up every letter of every page that has a publicly available URL.