Scientific research output should be free, universally, without hindrance.
It's myopic to try extract wealth from this public good by siloing it, by toll-gating access to it. Like barricading a public highway with toll-booths every 500 meters: it's a myopia that's blind to the public-good value of infrastructure—a myopia of greed that's a universal drain on public wealth, for some petty local optimization.
If you obstruct ML models on some financial profit theory, you're obstructing not only the ML entities; you're obstructing the thousand researchers downstream who stand to benefit from them. You're standing the in road blocking traffic, collecting tolls; you've not only stopped the vehicle in front of you, you've stopped a thousand more stranded behind it. It is a public nuisance.
Do you have a citation for that? Annual budget of the NSF is about $8 billion, covering all of science. Total NIH is $45 billion, and that includes other things. DOE Office of Science is about $8 billion. I think both of those cover a vast majority of grants funded in academia.
I suppose if you factor in some sort of tax breaks you could increase that, but I doubt you would get to trillions.
(I say this because I feel people drastically overestimate the amount of funding science gets in the US. For example, compare the above numbers with the profits of big tech companies)
[1] https://www.stlouisfed.org/on-the-economy/2018/may/rd-busine...
In the US. Also even in the US, is NASA included in one of those? Looks like it costs around 20B yearly. There may be more from the US, and then there's, you know, the rest of the world combined too.
If they're not comfortable with that, then they aren't comfortable with their research being free-as-in-freedom.
>"We are at a crossroads in the production and dissemination of research knowledge, and in my view the biggest problem with this deal is the reduction of academic research into raw content from which data can be extracted and repackaged as knowledge," Clemens said.
I'm sorry, what? The problem researchers have is their research being used as a resource for general knowledge? Do they only want their research to be helpful for other research and never have applicability outside academia?
It's all a question of who gets to do this first, until then everyone who doesn't gatekeep is a sucker. You think M$ won't do this?
I mean.. if dissemination of knowledge _truly_ is the end goal here.. and not just giving a free corporate lunch out because the job market is the broken and dysfunctional.
I have not kept up with the latest on LLM’s and licensing, but I’m curious: are scientific papers accessible to LLMs? Honestly, a bigger societal loss in my view is publishers like Elsevier restricting LLM access to research articles, rather than being too permissive. I could not care less if Elsevier makes a little bit of money in the process.
hand-waving it by "oh just have corporations promise they will cite their sources, honor system" is the easiest way to not have corporations cite their sources. Plus if these corporations are so gun-ho about how "we don't store your data in an LLM", that wouldn't be a trivial matter anyway.
>a bigger societal loss in my view is publishers like Elsevier restricting LLM access to research articles, rather than being too permissive.
it's only a bigger loss if you think the knowledge will be proportionally handed down to society. Which it rarely is.
I have been e-mailing authors directly for access to the article and have been successful most of the time. Maybe once or twice, the e-mails go unanswered. In one case, the primary author had died (a subsequent search linked a news article showing a skiing accident). In another case, the author had simply moved to a different university thus the e-mail in the paper was outdated.
Those are both topics that can be a post in and of itself, so I'll just keep it simple and emphasize once again that we should implement the 3C's when asking of anything from another person's IP. I doubt many of the older papers/articles had contracts that allowed for such usage. Reinforced by the article:
>The agreement with Microsoft was included in a trading update by the publisher’s parent company in May this year. However, academics published by the group claim they have not been told about the AI deal, were not given the opportunity to opt out and are receiving no extra payment for the use of their research by the tech company.
regardless of your position, this publishing group at worst lied and at best is being irresponsible, this isn't even an issue of AI or copyright. We can debate "well this is how it should be", but let's leave ShouldLand for a bit and actually look at the current situation. Trust being broken in real time.
We need new publishing models with strict copyright protections that protect against theft. Academics should run their own publishing houses as a cooperative.
Granted you may argue that is done poorly today but it certainly doesn't happen for no costs.
No, we need new publishing models that release the publicly-funded information into the public domain. If tax dollars paid for the research then there should be no copyright on the results of that research at all, owned by the publisher or authors.
This wouldn't represent a change for the authors anyway, who already are accustomed to having to sign away copyright (hence this story). The problem in this instance isn't that the papers are being fed into machine learning models, the problem is that the publisher is extracting a check for the privilege.
I see no indication that the affected academics are concerned with the money that their papers make them. Most want to give them away for free to HUMANS.
Edit: [1] By others, I mean our university libraries and any poor soul who does not have membership in one. Today, universities pay to access their own research.
Having a totally artificial middleman makes no sense when basically all publishing happens digitally. And the public deserves full access to the science they have funded.
https://en.wikipedia.org/wiki/Open_access#Colour_naming_syst...
Are they not a fact discoverer or truth revealer?
It's unclear to me researchers should “own” truths prior research and public patronage enabled them to unearth.
// note: research != invention, i.e., Space X experimenting until systems and machinery can land a rocket on a barge is not “research”, but testing and documenting characteristics of fuels in a vacuum as the environment swings from -100C to 120C is
Why? What else could it be if not research?
it could be experimentation and learning about your own invented system and mechanic, not uncovering of existing truths
the word "search" makes this more clear, the answers research finds already exist, we just didn't find them yet. that is not the case for inventions that don't work, and need to be experimented with until they work.
there may, certainly, be some incidental research discovery while attempting to make an invention work (e.g. if Space X had hit on something new about physics itself that had been causing their prior experiments to fail) -- but nobody "owns" such truth discovered, while of course they own their invention that works as a result of applying known truths to a creation.
The article seems to want to bring rights holders together in a way that isn’t valid.
A scientific paper is normative. Its construction is closer to a pull request than artwork.