It does suit the modus operandi of a number of American companies that start out as literally illegal/criminal operations until they get big and rich enough to pay a fine for their youthful misdeeds.
By the time some of them get huge, they're in bed with the government to dominate the market.
The anti-AI stance is what is baffling to me. The path trotten is what got us here and obviously nobody could have paid people upfront for the wild experimentation that was necessary. The only alternative is not having done it.
Given the path it has put as in, people either are insanely cruel or just completely detached from reality when it comes to what is necessary to do entirely new things.
Perhaps the biggest “needs citation” statement of our time.
Not in any weirdly-self-aggrandizing "our tech is so powerful that robots will take over" sense, just the depressingly regular one of "lots of people getting hurt by a short-term profitable product/process which was actually quite flawed."
P.S.: For example, imagine having applications for jobs and loans rejected because all the companies' internal LLM tooling is secretly racist against subtle grammar-traces in your writing or social-media profile. [0]
Yes, we are clearly talking about things to mostly still come here. But if you assign a 0 until its a 1 you are just signing out of advancing anything that's remotely interesting.
If you are able to see a path to 1 on AI, at this point, then I don't know how you would justify not giving it our all. If you see a path and in the end using all of human knowledge up to this point was needed to make AI work for us, we must do that. What could possibly be more beneficial to us?
This is regardless of all issues the will have to be solved and the enormous amount of societal responsibility this puts on AI makers — which I, as a voter, will absolutely hold them accountable for (even though I am actually fairly optimistic they all feel the responsibility and are somewhat spooked by it too).
But that does not mean I think it's responsible to try and stop them at this point — which the copyright debate absolutely does. It would simply shut down 95% of AI, tomorrow, without any other viable alternative around. I don't understand how that is a serious option for anyone who roots for us.
OpenAI's case is especially egregious, with the entire starting as 'open' and reaping the benefits, then doing its best in every way to shut the door after itself by scaring people over AI apocalypses. If your argument is seriously that it is necessary to shamelessly steal and lie to do new things, I question your ethical standards, especially in the face of all the openly developed models out there.
The anti-AI stance is what is baffling to me.
I think it’s unfair to paint any legal controls over this incredibly important, high-stakes technology as being “anti”. They’re not trying to prevent innovation because they’re cruel, they’re just trying to somewhat slow down innovation so that we can ensure it’s done with minimal harm (eg making sure content creators are compensated in a time of intense automation). Like we do for all sorts of other fields of research, already!And isn’t this what basically every single scholar in the field says they want, anyway - safe, intentional, controlled deployment?
As you can tell from the above, I’m as far from being “anti-AI” or technically pessimistic as one can be — I plan to dedicate my life to its safe development. So there’s at least one counterexample for you to consider :)
> The anti-AI stance is what is baffling to me
I don't see s lot of anti AI but instead I see a concern for how it's just being managed and controlled by the larger companies with resources that no start up could dream. Open AI was to release it's models and be well.. Open but fine they're not. But their behaviour of how things are proceeding are questionable and unnecessarily aggravating.
And who is the one calling for action?
Sorry for being dense, but I'm trying to understand if I'm the "strong" or the "weak" in your analogy.
"Hugely beneficial" is a stretch at this point. It has the potential to be hugely beneficial, sure, but it also has the potential to be ruinous.
We're already seeing GenAI being used to create disinformation at scale. That alone makes the potential for this being a net-negative very high.
I don't think this is the "ends justify the means" argument you think it is.
The crazy thing is that there hasn't been an injunction to make them stop.
Update: ML doesn't copy information. It can merely memorise some small portions of it.
A more fitting metaphor would be something like... If you had the ability to read all the books in the library extremely quickly, and to make useful mental connections between the information you read such that people would come to you for your vast knowledge, should you be allowed in the library?
https://www.copyright.gov/title37/201/37cfr201-14.html
§ 201.14 Warnings of copyright for use by certain libraries and archives.
....
The copyright law of the United States (title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material.
Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specific conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use,” that user may be liable for copyright infringement.
This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law.
You can make a copy. If you (the person using the copied work) are using it for something other than private study, scholarship, research, or reproduction beyond "fair use", then you - the person doing that (not the person who made the copy) are liable for infringement.It would be perfectly legal for me to go to the library and make photocopies of works. I could even take them home and use the photocopies as reference works write an essay and publish that. If {random person} took my photocopied pages and then sold them, that would likely go beyond the limits placed for how the photocopied works from the library may be used.
burning the bridge so nobody else can legally scrape, that's the line.
Assets like the Internet Archive, though, should be protected at all costs.
It's completely unprecedented.
We allowed scraping images and text en masse when search engines used the data to let us find stuff.
We allow copying of style, and don't allow writing styles and aesthetics to be copyrighted or trademarked.
Then AI shows up, and people change lanes because they don't like the results.
One of the things that made me tilt towards the side of fair use was a breakdown of the Stable Diffusion model. The SD2.1 base model was trained on 5.85 billion images, all normalized to 512x512 BMP. That's 1MB per images, for a total of 5.85PB of BMP files. The resulting model is only 5.2GB. That's more than 99.999999% data loss from the source data to the trained set.
For every 1MB BMP file in the training dataset, less than 1byte makes it into the model.
I find it extremely difficult to call this redistribution of copyrighted data. It falls cleanly into fair use.
Their arguments against this amounts to "we're not using it like they intend it to be used, so it's fine if we obtain it illegally", and that's a bs standard, totally divorced from any legal reality.
Fair Use covers certain transformative uses, certainly, but it doesn't cover illegal obtaining of the content.
You can't pirate a book just because you want to use it transformatively (which is exactly what they've done), and that argument would never hold up for us as individuals, so we sure as hell shouldn't let tech companies get a special carve-out for it.
I do not have confidence in the Supreme Court in general, and I think there's a real risk that in deciding on AI training they upend copyright of digital materials in a way that makes it worse for everyone.