This is such devious, but increasingly obvious, narrative crafting by a commercial entity that has proven itself adversarial to an open and decentralized internet / ideas and knowledge economy.
The argument goes as follows:
- The future of AI is open source and decentralized
- We want to win the future of AI instead, become a central leader and player in the collective open-source community (a corporate entity with personhood for which Mark is the human mask/spokesperson)
- So let's call our open-weight models open-source, and benefit from its imago, require all Llama developers to transfer any goodwill to us, and decentralize responsibility and liability, for when our 20 million dollar plus "AI jet engine" Waifu emulator causes harm.
Read the terms of use / contract for Meta AI products. If you deploy it, some producer finds the model spits out copyrighted content, knocks on Meta's door, Meta will point to you for the rest of the court case. If that's the future for AI, then it doesn't really matter whether China wins.
As much as I hate Facebook, I think that seems pretty… reasonable? These AI tools are just tools. If somebody uses a crayon to violate copyright, the crayon is not to blame, and certainly the crayon company is not, the person using it is.
The fact that Facebook won’t voluntarily take liability for any thing their users’ users’ might do with their software means that software might not be useable in some cases. It is a reason to avoid that software if you have one of those use cases.
But I think if you find some company that says “yes, we’ll be responsible for anything your users do with with our product,” I mean… that seems like a hard promise to take seriously, right?
While Mark claims his Open Source AI is safer, because fully transparent and many eyes make all bugs shallow, the latest technical report makes mention of an internal, secret, benchmark that had to be developed, because available benchmarks did not suffice at that level of capabilities. For child abuse generation, it only makes mention that it investigated this, not any results of these tests or conditions under which it possibly failed. They shove all this liability on the developer, while claiming any positive goodwill generated.
It completely loses their motivation to care for AI safety and ethics if fines don't punish them, but those who used the library to build.
Reasonable for Meta? Yes. Reasonable for us to nod along when they misuse open source to accomplish this? No.
I think the success of the "Threads + Fediverse = <3" relies on the Fediverse not throwing the towel and leaving Threads as the biggest player in the space. That would mean fixing a lot of problems that that people have with Activity Pub today.
I don't want to say the big tech are awesome and without fault, but at the end of the day big-techs will be big-techs. Let's keep the Fediverse relevant and Meta will continue to support it, otherwise it will be swallowed by the bigger fish.
Given the nature of the fediverse, if it happened or not depends on the instance you use/follow.
So Meta says "well we will buy tons of compute and try to make it distributed" "we'll make the model open and people will fine-tune with data that they found" and so on. Now google and openAI aren't competing versus meta, they are competing versus meta + all compute owned by amateurs + all data scrapped by all amateurs, which is non-trivial. so it's not so much as aspiring to be #1 as capping the knees of the competition who has superior competitiveness - but people love it because the common man wins here for once.
Anyway, eventually, they'll all be open models. Near future weaker models will run on a PC, bigger models on the cluster, weakest models on the phone... then just weak models on the phone and bigger on the PC.. eventually anything and everything fits on a phone and maybe iWatch. Even Google and openAI will have to run on the PC/phone at this point, it wouldn't make sense not to. Then since people have local access to these devices, it all gets reverse engineered, boom boom boom. now they're all open
Code is a single input and is cheap to compile, modify, and distribute. It's cheap to run.
Models are many things: data sets, data set processing code, training code, inference code, weights, etc. But it doesn't even matter if all of these inputs are "open source". Models take millions of dollars to train, and the inference costs aren't cheap either.
edit:
Remember when platforms ate the open web? We might be looking at a time where giants eat small software due to the cost and scale barriers.
Everyone tries this. Apple tried it with lawsuits and patents, Facebook did it under the guise of privacy, OpenAI will do it under the guise of public safety.
There's almost no case where a private company is going to be able to successfully argue "they shouldn't be allowed but we should" I wonder why so many companies these days try. Just hire better people and win outright.
The #1 problem is not compute, but data and the manpower required to clean that data up.
The main thing you can do is support companies and groups who are releasing open source models. They are usually using their own data.
To my knowledge all of the notable open source models are subsidised by corporations in one way or another, whether by being the side project of a mega-corp which can absorb the loss (Meta) or coasting on investor hype (Mistral, Stability). Neither of those give me much confidence that they will continue forever, especially the latter category which will just run out of money eventually.
For open source AI to actually be sustainable it needs to stand on its own, which will likely require orders of magnitude more efficient training, and even then the data cleaning and RLHF are a huge money sink.
The #1 problem is absolutely compute. People barely get funding for fine tunes, and even if you physically buy the GPUs it'll cost you in power consumption.
That said, good data is definitely the #2 problem. But nowadays you can just get good synthetic datasets from calling closed model APIs or just using existing local LLMs to sift through trash. That'll cost you too.
Alternatively we could create standardized open source training data like wikipedia, wikimedia as well as public domain literature and open courseware. I'm sure that there are many other such free and legal sources of data.
Look at how much computer purple AI actually has. It’s basically nothing.
AFAICT it decentralizes the training of these models by giving you an incentive to train models which will mine the crypto if you're improving it.
I learned about it years ago, mined some crypto, lost the keys and now kicking myself cuz I would've made a pretty penny lol
1. Open source is for losers. I'm not calling anyone involved in open source a loser, to be clear. I have deep respect for anyone who volunteers their time for this. I'm saying that when companies push for open source it's because they're losing in the marketplace. Always. No companiy that is winning ever open sources more than a token amount for PR; and
2. Joel Spolsky's now 20+ year old letter [1]:
> Smart companies try to commoditize their products’ complements.
Meta is clearly behind the curve on AI here so they're trying to commoditize it.
There is no moral high ground these companies are operating from. They're not using their vast wisdom to predict the future. They're trying to bring about the future the most helps them. Not just Meta. Every company does this.
It's why you'll never see Meta saying the future of social media is federation, open source and democratization.
[1]: https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
Then I can go ahead and train my open source model.
> OpenAI or Claude seem to garner more positive views than llama open sourced.
that's more about Meta than the others. Although OpenAI isn't that far from Meta already.
Now ask yourself a question: where does Meta's data come from? Perhaps from their users' data? And they opted everyone in by default. And made the opt-out process as cumbersome as possible: https://threadreaderapp.com/thread/1794863603964891567.html And now complain that the EU is preventing them from "collecting rich cultural context" or something https://x.com/nickclegg/status/1834594456689066225
nope, not yet.
FAIR, the people that do the bigboi training, for a lot of their stuff cant even see user data, because the place they do the training can't support the access.
Its not like openAI where the lawyers don't even know whats going on, because they've not yet been properly taken to court.
at Meta, the lawyers are everywhere and if you do naughty shit to user data, you are going to be absolutely fucked.
Because all indications are that the powers over you cannot abide your freedoms of association, communication and commerce.
So, if it’s something your family needs to survive - it has better be distributed and cryptographically secured against interference.
This includes interference in the training dataset of whatever AIs you use; this has become a potent influence on the formation of beliefs, and thus extremely valuable.
All of these models, including the "open" ones, have been RLHF'ed by teams of politically-motivated people to be "safe" after initial foundation training.
This corruption must be disclosed as assiduously as the base dataset, if not more so.
Try it for yourself: https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
* does not apply to training data
This is chess pieces being moved around the board at the moment.
yet when you look back at history, things that were revolutionary, it was due to low cost of production. web, bicycles, cars, steam engine cars etc.
Nuclear everything, rockets/satellites, tons of revolutionary things that are very expensive to produce and develop.
Also software scales differently.
Cost of compute will continue decreasing and we will reach that point where it is feasible to have AI everywhere. I think with this particular technology we have already reached a no return point
Which costs significantly more than H100 at least when renting[1]. Also the price of hardware isn't significantly lower.
Also, both AMD and Nvidia have been deliberately stopping progress in cheaper consumer graphics card by not increasing VRAM and removing things like fast interconnect.
There is a lot of interest in regulating open source AI, but many sources of criticism miss the point that open source AI helps democratize access to technologies. It worries me that Meta is proposing an open source and decentralized future because how does that serve their company? Or is there some hope of creating a captive audience? I hate to be a pessimist or cynic, but just wondering out loud, haha. I am happy to be proven wrong.
If that means it'll be free/cheaper... sure
And for all the negativity seen in many of the comments here I think it’s actually quite remarkable that they make model checkpoints available freely. It’s an externality, but a positive one. Not quite there yet in terms of the ideal - which is definitely open source - and surely with an abuse of language, which I also note. But overall, the best that is achievable now I think.
The true question we should be tackling is, is there an incentive-compatible way to develop foundation models in a truly open source way? How to promote these conditions, if they do exist?
This must be a sign that Meta is not confident in their AI offerings.
They use "open source" to whitewash their image.
Now ask yourself a question: where does Meta's data come from? Perhaps from their users' data? And they opted everyone in by default. And made the opt-out process as cumbersome as possible: https://threadreaderapp.com/thread/1794863603964891567.html And now complain that the EU is preventing them from "collecting rich cultural context" or something https://x.com/nickclegg/status/1834594456689066225
> In 2012, Red Hat Inc. accused VMWare Inc. and Microsoft Corp. of openwashing in relation to their cloud products.[6] Red Hat claimed that VMWare and Microsoft were marketing their cloud products as open source, despite charging fees per machine using the cloud products.
Other companies are way more careful using "open source" in relation to their AI models. Meta now practically owns the term "Open Source AI" for whatever they take it to mean, might as well call it Meta AI and be done with it: https://opensource.org/blog/metas-llama-2-license-is-not-ope...