Llama 3.1 - https://news.ycombinator.com/item?id=41046540 - July 2024 (114 comments)
$279mm in 1957 dollars is about $3.2bn today [2]. A public cluster of GPUs provided for free to American universities, companies and non-profits might not be a bad idea.
[1] https://en.m.wikipedia.org/wiki/Heavy_Press_Program
[2] https://data.bls.gov/cgi-bin/cpicalc.pl?cost1=279&year1=1957...
(To connect universities to the different supercomputing centers, the NSF funded the NSFnet network in the 80s, which was basically the backbone of the Internet in the 80s and early 90s. The supercomputing funding has really, really paid off for the USA)
Not sure why a publicly accessible GPU cluster would be a better solution than the current system of research grants.
Sure, academia could build LLMs, and there is at least one large-scale project for that: https://gpt-nl.com/ On the other hand, this kind of models still need to demonstrate specific scientific value that goes beyond using a chatbot for generating ideas and summarizing documents.
So I fully agree that the research budget cuts in the past decades have been catastrophic, and probably have contributed to all the disasters the world is currently facing. But I think that funding prestigious super-projects is not the best way to spend funds.
Until we get cheaper cards that stand the test of time, building a public cluster is just a waste of money. There are far better ways to spend $1b in research dollars.
AI is a fad, the brick and mortar of the future is open source tools.
USA and Europe is already doing that in a grand scale, in different forms. Both at national and international scale.
I work at an HPC center which provides servers nationally and collaborates on international level.
[1] https://www.technologyreview.com/2024/05/13/1092322/why-amer...
How much capability would $3.2bn in terms of AI computing power provide, including the operational and power costs of the cluster?
Certainly, you could build a "$3.2bn GPU cluster", but it would be dark.
So, how much learning time would $3.2bn provide? 1 year? 10 years?
Just curious about hand wavy guesses. I have no idea the scope of the these clusters.
They probably won't be using it now because the phone in your pocket is likely more powerful. Moore law did end but data center stuff are still evolving order of magnitudes faster than forging presses.
If anything, allocate compute to citizens.
I find the language around "open source AI" to be confusing. With "open source" there's usually "source" to open, right? As in, there is human legible code that can be read and modified by the user? If so, then how can current ML models be open source? They're very large matrices that are, for the most part, inscrutable to the user. They seem akin to binaries, which, yes, can be modified by the user, but are extremely obscured to the user, and require enormous effort to understand and effectively modify.
"Open source" code is not just code that isn't executed remotely over an API, and it seems like maybe its being conflated with that here?
There is still a lot of modifying you can do with a set of weights, and they make great foundations for new stuff, but yeah we may never see a competitive model that's 100% buildable at home.
Edit: mkolodny points out that the model code is shared (under llama license at least), which is really all you need to run training https://github.com/meta-llama/llama3/blob/main/llama/model.p...
I believe this is the current draft: https://opensource.org/deepdive/drafts/the-open-source-ai-de...
People are framing this as if it was an open-source hierarchy, with "actual" open-source requiring all training code to be shared. This is not obvious to me, as I'm not asking people that share open-source libraries to also share the tools they used to develop them. I'm also not asking them to share all the design documents/architecture discussion behind this software. It's sufficient that I can take the end result and reshape it in any way I desire.
This is coming from an LLM practitioner that finetunes models for a living; and this constant debate about open-source vs open-weights seems like a huge distraction vs the impact open-sourcing something like Llama has... this is truly a Linux-like moment. (at a much smaller scale of course, for now at least)
The source of a language model is the text it was trained on. Llama models are not open source (contrary to their claims), they are open weight.
There is still a lot you can do with weights, like fine tuning, and it is arguably more useful as retraining the entire model would cost millions in compute.
- If we start with the closed training set, that is closed and stolen, so call it Stolen Source.
- What is distributed is a bunch of float arrays. The Llama architecture is published, but not the training or inference code. Without code there is no open source. You can as well call a compiler book open source, because it tells you how to build a compiler.
Pure marketing, but predictably many people follow their corporate overlords and eagerly adopt the co-opted terms.
Reminder again that FB is not releasing this out of altruism, but because they have an existing profitable business model that does not depend on generated chats. They probably do use it internally for tracking and building profiles, but that is the same as using Linux internally, so they release the weights to destroy the competition.
Isn't price dumping an anti trust issue?
If everyone open sources their AI code, Meta can snatch the bits that help them without much fear of helping their direct competitors.
"Finding an agreement on what constitutes Open Source AI is the most important challenge facing the free software (also known as open source) movement. European regulation already started referring to "free and open source AI", large economic actors like Meta are calling their systems "open source" despite the fact that their license contain restrictions on fields-of-use (among other things) and the landscape is evolving so quickly that if we don't keep up, we'll be irrelevant."
[1] https://fosdem.org/2024/schedule/event/fosdem-2024-2805-movi... defining-open-source-ai/
I'm not sure if facebook has done that
Strategy of FB is that they are good to be a user only and fine ruining competitor’s business with good enough free alternatives while collecting awards as saviors of whatever.
Does the training data require permission from the copyright holder to use? Are the weights really open source or more like compiled assembly?
The actual point that matters is that these models are available for most people to use for a lot of stuff, and this is way way better than what competitors like OpenAI offer.
.. the thing is, we have not dealt with llm much, it's hard to say what can be considered open source llm just yet, so we use that as metaphore for now
This post is an ad and trying to paint these things as something they aren't.
- No more vendor lock-in
- Instead of just wrapping proprietary API endpoints, developers can now integrate AI deeply into their products in a very cost-effective and performant way
- Price race to the bottom with near-instant LLM responses at very low prices are on the horizon
As a founder, it feels like a very exciting time to build a startup as your product automatically becomes better, cheaper, and more scalable with every major AI advancement. This leads to a powerful flywheel effect: https://www.kadoa.com/blog/ai-flywheel
Maybe a big price war while the market majors fight out for positioning but they still need to make money off their investments so someone is going to have to raise prices at some point and youll be locked into their system if you build on it.
Including adtech models, which are predominantly cloud-based.
And so the models that have mechanisms for curating and preventing such misapplied weighting, and then the organizations and individuals who accurately create adjustments to the models, will in the end be the winners - where truth has been more honed for.
This is not altruism although it's still great for devs and startups. All FB GPU investments is primarily for new AI products "friends", recommendations and selling ads.
https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
* they need LLMs that they can control for features on their platforms (Fb/Instagram, but I can see many use cases on VR too)
* they cannot sell it. They have no cloud services to offer.
So they would spend this money anyways, but to compensate some losses they just decided to use it to fix their PR by contenting developers
Given the mountain of GPUs they bought at precisely the right moment I don't think that's entirely accurate
> A complement is a product that you usually buy together with another product. Gas and cars are complements. Computer hardware is a classic complement of computer operating systems. And babysitters are a complement of dinner at fine restaurants. In a small town, when the local five star restaurant has a two-for-one Valentine’s day special, the local babysitters double their rates. (Actually, the nine-year-olds get roped into early service.)
> All else being equal, demand for a product increases when the prices of its complements decrease.
Smart phones ar a complement of Instagram. VR headsets are a complement of the metaverse. AI could be a component of a social network, but it's not a complement.
Someone can correct me here but AFAIK we don't even know which datasets are used to train these models, so why should we even use "open" to describe Llama? This is more similar to a freeware than an open-source project.
[1] https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/202...
In fairness to Llama, the source code itself (though not the training data) is available to access, although not really under a license that many would consider open source.
This means they need content that will grab attention, and creating open source models that allow anyone to create any content on their own becomes good for Meta. The users of the models can post it to their Instagram/FB/Threads account.
Releasing an open model also releases Meta from the burden of having to police the content the model generates, once the open source community fine-tunes the models.
Overall, this move is good business move for Meta - the post doesn't really talk about the true benefit, instead moralizing about open source, but this is a sound business move for Meta.
1. Is there such a thing as 'attention grabbing AI content' ? Most AI content I see is the opposite of 'attention grabbing'. Kindle store is flooded with this garbage and none of it is particularly 'attention grabbing'.
2. Why would creation of such content, even if it was truly attention grabbing, benefit meta in particular ?
3. How would poliferation of AI content lead to more ad spend in the economy. Ad budgets won't increase because of AI content?
To me this is typical Zuckerberg play. Attach metas name to whatever is trendy at the moment like ( now forgotten) metaverse, cryptocoins and bunch of other failed stuff that was trendy for a second. Meta is NOT an Gen AI company ( or a metaverse company, or a cypto company) like he is scamming ( more like colluding) the market to believe. A mere distraction from slowing user growth on ALL of meta apps.
ppl seem to have just forgotten this https://en.wikipedia.org/wiki/Diem_(digital_currency)
More important is the products that Meta will be able to make if the industry standardizes on Llama. They would have the front seat in not just with access the latest unreleased models but also settings the direction of progress and next gen LLM optimizes for. If you're Twitter or Snap or TikTok or compete with Meta on the product then good luck in trying to keep up.
That is why they hopped on the Attention is All You Need train
Then all other visual AI content will be banned. If that is where legislation is heading.
But I have strong doubts they (or any other company) actually believe what they are saying.
Here is the reality:
- Facebook is spending untold billions on GPU hardware.
- Facebook is arguing in favor of open sourcing the models, that they spent billions of dollars to generate, for free...?
It follows that companies with much smaller resources (money) will not be able to match what Facebook is doing. Seems like an attempt to kill off the competition (specifically, smaller organizations) before they can take root.
Through this lense, Meta’s actions make more sense to me. Why invest billions in VR/AR? The answer is simple, don’t get locked out of the next platform, maybe you can own the next one. Why invest in LLMs? Again, don’t get locked out. Google and OpenAi/Microsoft are far larger and ahead of Meta right now and Meta genuinely believes the best way to make sure they have an LLM they control is to make everyone else have an LLM they can control. That way community efforts are unified around their standard.
There is still, just about, a strong ethos( especially in the research teams) to chuck loads of stuff over the wall into opensource. (pytorch, detectron, SAM, aria etc)
but its seen internally as a two part strategy:
1) strong recruitment tool (come work with us, we've done cool things, and you'll be able to write papers)
2) seeding the research community with a common toolset.
Meta wants to make sure they commoditize their complements: they don’t want a world where OpenAI captures all the value of content generation, they want the cost of producing the best content to be as close to free as possible.
Bravo! While I don't agree with Zuck's views and actions on many fronts, on this occasion I think he and the AI folks at Meta deserve our praise and gratitude. With this release, they have brought the cost of pretraining a frontier 400B+ parameter model to ZERO for pretty much everyone -- well, everyone except Meta's key competitors.[a] THANK YOU ZUCK.
Meanwhile, the business-minded people at Meta surely won't mind if the release of these frontier models to the public happens to completely mess up the AI plans of competitors like OpenAI/Microsoft, Google, Anthropic, etc. Come to think of it, the negative impact on such competitors was likely a key motivation for releasing the new models.
---
[a] The license is not open to the handful of companies worldwide which have more than 700M users.
For now, Meta seems to release Llama models in ways that don't significantly lock people into their infrastructure. If that ever stops being the case, you should fork rather than trust their judgment. I say this knowing full well that most of the internet is on AWS or GCP, most brick and mortar businesses use Windows, and carrying a proprietary smartphone is essentially required to participate in many aspects of the modern economy. All of this is a mistake. You can't resist all lock-in. The players involved effectively run the world. You should still try where you can, and we should still be happy when tech companies either slip up or make the momentary strategic decision to make this easier
Also, the underdog always touts Open Source and standards, so it’s good to remain skeptical when/if tables turn.
We interviewed Thomas who led Llama 2 and 3 post training here in case you want to hear from someone closer to the ground on the models https://www.latent.space/p/llama-3
"Commoditize Your Complement" is often cited here: https://gwern.net/complement
It's a proprietary dump of data you can't replicate or verify.
What were the sources? What datasets it was trained on? What are the training parameters? And so on and so on
It is still far from zero.
Is it possible to run this with ollama?
Nope. Not one bit. Supporting F/OSS when it suits you in one area and then being totally dismissive of it in every other area should not be lauded. How about open sourcing some of FB's VR efforts?
Step 1. Chick-Fil-A releases a grass-fed beef burger to spite other fast-food joints, calls it "the vegan burger"
Step 2. A couple of outraged vegans show up in the comments, pointing out that beef, even grass-fed beef, isn't vegan
Step 3. Fast food enthusiasts push back: it's unreasonable to want companies to abide by this restrictive definition of "vegan". Clearly this burger is a gamechanger and the definition needs to adapt to the times.
Step 4. Goto Step 2 in an infinite loop
That's the difference between open source and free software.
I.e., the more important thing - the more "free" thing - is the licensing now.
E.g., I play around with different image diffusion models like Stable Diffusion and specific fine-tuned variations for ControlNet or LoRA that I plug into ComfyUI.
But I can't use it at work because of the licensing. I have to use InvokeAI instead of ComfyUI if I want to be careful and only very specific image diffusion models without the latest and greatest fine-tuning. As others have said - the weights themselves are rather inscrutable. So we're building on more abstract shapes now.
But the key open thing is making sure (1) the tools to modify the weights are open and permissive (ComfyUI, related scripts or parts of both the training and deployment) and (2) the underlying weights of the base models and the tools to recreate them have MIT or other generous licensing. As well as the fine-tuned variants for specific tasks.
It's not going to be the naive construction in the future where you take a base model and as company A you produce company A's fine tuned model and you're done.
It's going to be a tree of fine-tuned models as a node-based editor like ComfyUI already shows and that whole tree has to be open if we're to keep the same hacker spirit where anyone can tinker with it and also at some point make money off of it. Or go free software the whole way (i.e., LGPL or equivalent the whole tree of tools).
In that sense unfortunately Llama has a ways to go to be truly open: https://news.ycombinator.com/item?id=36816395
In terms of inference and interface (since you mentioned comfy) there are many truly open source options such as vLLM (though there isn't a single really performant open source solution for inference yet).
Ok, first of all, has this really worked? AI moderators still can't capture the mass of obvious spam/bots on all their platforms, threads included. Second, AI detection doesn't work, and with how much better the systems are getting, it's probably never going to, unless you keep the best models for yourself, and it's is clear from the rest of the note that its not zuck's intention to do so.
> As long as everyone has access to similar generations of models – which open source promotes – then governments and institutions with more compute resources will be able to check bad actors with less compute.
This just doesn't make sense. How are you going to prevent AI spam, AI deepfakes from causing harm with more compute? What are you gonna do with more compute about nonconsensual deepfakes? People are already using AI to bypass identity verification on your social media networks, and pump out loads of spam.
I don't think that's true. I don't think even the best privately held models will be able to detect AI text reliably enough for that to be worthwhile.
I still agree with his general take - bad actors will get these models or make them themselves, you can't stop it. But the logic about compute power is odd.
FB was notorious for censorship. Anyway, what is with the "actions/actors" terminology? This is straightforward totalitarian language.
This also has the important effect of neutralizing the critique of US Government AI regulation because it will democratize "frontier" models and make enforcement nearly impossible. Thank you, Zuck, this is an important and historic move.
It also opens up the market to a lot more entry in the area of "ancillary services to support the effective use of frontier models" (including safety-oriented concerns), which should really be the larger market segment.
Plus there's still the spectre of SB-1047 hanging around.
Is the vision here to treat LLM-based AI as a "public good", akin to a utility provider in a civilized country (taxpayer funded, govt maintained, non-for-profit)?
I think we could arguably call this "open source" when all the infra blueprints, scripts and configs are freely available for anyone to try and duplicate the state-of-the-art (resource and grokking requirements nonwithstanding)
Llama could change the license on later versions to kill your business and you have no options as you don't know how they trained it or have the budget to.
It's not much more free than binary software.
The whole thing is interesting, but this part strikes me as potentially anticompetitive reasoning. I wonder what the lines are that they have to avoid crossing here?
"Commoditize your complements" is an accepted strategy. And while pricing below cost to harm competitors is often illegal, the reality is that the marginal cost of software is zero.
Which open-source has such restrictions and clause?
C'mon folks, they're opening up for free to 99.99% of potential users what cost hundreds of millions of dollars, if not in the ballpark of a billion.
Let's appreciate that, instead of focusing on semantics for a while.
The HPC domain (data and compute intensive applications that typically need vector, parallel or other such architectures) have been around for the longest time, but confined to academic / government tasks.
LLM's with their famous "matrix multiply" at their very core are basically demolishing an ossified frontier where a few commercial entities (Intel, Microsoft, Apple, Google, Samsung etc) have defined for decades what computing looks like for most people.
Assuming that the genie is out of the bottle, the question is: what is the shape of end-user devices that are optimally designed to use compute intensive open source algorithms? The "AI PC" is already a marketing gimmick, but could it be that Linux desktops and smartphones will suddenly be "ΑΙ natives"?
For sure its a transformational period and the landscape T+10 yrs could be drastically different...
I think it's interesting to think about this question of open source, benefits, risk, and even competition, without all of the baggage that Meta brings.
I agree with the FTC, that the benefits of open-weight models are significant for competition. The challenge is in distinguishing between good competition and bad competition.
Some kind of competition can harm consumers and critical public goods, including democracy itself. For example, competing for people's scarce attention or for their food buying, with increasingly optimized and addictive innovations. Or competition to build the most powerful biological weapons.
Other kinds of competition can massively accelerate valuable innovation. The FTC must navigate a tricky balance here — leaning into competition that serves consumers and the broader public, while being careful about what kind of competition it is accelerating that could cause significant risk and harm.
It's also obviously not just "big tech" that cares about the risks behind open-weight foundation models. Many people have written about these risks even before it became a subject of major tech investment. (In other words, A16Z's framing is often rather misleading.) There are many non-big tech actors who are very concerned about current and potential negative impacts of open-weight foundation models.
One approach which can provide the best of both worlds, is for cases where there are significant potential risks, to ensure that there is at least some period of time where weights are not provided openly, in order to learn a bit about the potential implications of new models.
Longer-term, there may be a line where models are too risky to share openly, and it may be unclear what that line is. In that case, it's important that we have governance systems for such decisions that are not just profit-driven, and which can help us continue to get the best of all worlds. (Plug: my organization, the AI & Democracy Foundation; https://ai-dem.org/; is working to develop such systems and hiring.)
i am not down with this concept of the chattering class deciding what are good markets and what are bad, unless it is due to broad-based and obvious moral judgements.
But this is really positive stuff and it’s nice to view my time there through the lens of such a change for the better.
Keep up the good work on this folks.
Time to start thinking about opening up a little on the training data.
Dead internet theory is very much happening in real time, and I dread what's about to come since the world has collectively decided to lose their minds with this AI crap. And people on this site are unironically excited about this garbage that is indistinguishable from spam getting more and more popular. What a fucking joke
I'm excited about the former since AI has massively improved my productivity as a programmer to a point where I can't imagine going back. Everything is not black or white and people can be excited about one part of something and hate another at the same time.
* * *
On "collective losing of minds", you might appreciate this quote from 1841 (!) by Charles MacKay. I quoted it in the past[1] here, but is worth re-posting:"In reading the history of nations, we find that, like individuals, they have their whims and their peculiarities; their seasons of excitement and recklessness, when they care not what they do. We find that whole communities suddenly fix their minds upon one object, and go mad in its pursuit; that millions of people become simultaneously impressed with one delusion, and run after it, till their attention is caught by some new folly more captivating than the first [...]
"Men, it has been well said, think in herds; it will be seen that they go mad in herds, while they only recover their senses slowly, and one by one."
— from MacKay's book, 'Extraordinary Popular Delusions and the Madness of Crowds'
Personally I have tons of creative ideas which I think would be interesting and engaging but for which I lack the resources to bring into this world, so I'm hoping that in the long term AI tools can help bridge this gap. I'm really hopeful for a world where people from all over the world can share their creative stories, rather than being mostly limited to a few rich people in Hollywood.
Unfortunately I do expect this to end up being the minority of content, especially as we continue being flooded by increasing amounts of trash. But maybe that's just opening up the opportunity for someone to develop new content curation tools. If anything, even before the rise of AI stuff there were mountains of content, and we saw with the rise of TikTok that a good recommendation algorithm still leaves room for new platforms.
Maybe they’ll go outside.
The AI model complements the platform, and their platform is the money maker. They hold the belief that open sourcing their tools benefit their platform on the long run, which is why they're doing it. And in doing so, they aren't under the control of any competitors.
I would say it's more like a grocery store providing free parking, a bus stop, self-checkout, online menu, and free delivery.
Not the usual nation-state rhetoric, but something that justifies that closed source leads to better user-experience and fewer security and privacy issues.
An ecosystem that benefits vendors, customers, and the makers of close source?
Are there historical analogies other than Microsoft Windows or Apple iPhone / iOS?
But they still have 70 thousand people (a small country) doing _something_. What are they doing? Updating Facebook UI? Not really, the UI hasn't been updated, and you don't need 70 thousand people to do that. Stuff like React and Llama? Good, I guess, we'll see how they make use of Llama in a couple of years. Spellcheck for posts maybe?
This is a very important concern in Health Care because of HIPAA compliance. You can't just send your data over the wire to someone's proprietary API. You would at least need to de-identify your data. This can be a tricky task, especially with unstructured text.
---
Some observations:
* The model is much better at trajectory correcting and putting out a chain of tangential thoughts than other frontier models like Sonnet or GPT-4o. Usually, these models are limited to outputting "one thought", no matter how verbose that thought might be.
* I remember in Dec of 2022 telling famous "tier 1" VCs that frontier models would eventually be like databases: extremely hard to build, but the best ones will eventually be open and win as it's too important to too many large players. I remember the confidence in their ridicule at the time but it seems increasingly more likely that this will be true.
Okay then Mark. Replace "modern AI models" with "social media" and repeat this statement with a straight face.
It's a bit buggy but it is fun.
Disclaimer: I am the author of L2E
On a more serious note, I don't really buy his arguments about safety. First, widespread AI does not reduce unintentional harm but increases it, because the rate of accident is compound. Second, the chance of success for threat actors will increase, because of the asymmetric advantage of gaining access to all open information and hiding their own information. But there is no reverse at this point, I enjoy it while it lasts, AGI will come sooner or later anyway.
1. Software: this is all Pytorch/HF, so completely open-source. This is total parity between what corporates have and what the public has.
2. Model weights: Meta and a few other orgs release open models - as opposed to OpenAI's closed models. So, ok, we have something to work with.
3. Data: to actually do anything useful you need tons of data. This is beyond the reach of the ordinary man, setting aside the legality issues.
4. Hardware: GPUs, which are extremely expensive. Not just that, even if you have the top dollars, you have to go stand in a queue and wait for O(months), since mega-corporates have gotten there before you.
For Inference, you need 1,2 and 4. For training (or fine-tuning), you need all of these. With newer and larger models like the latest Llama, 4 is truly beyond the reach of ordinary entities.
This is NOTHING like open-source, where a random guy can edit/recompile/deploy software on a commodity computer. Wrt LLMs, Data/Hardware are in the equation, the playing field is complete stacked. This thread has a bunch of people discussing nuances of 1 and 2, but this bike-shedding only hides the basic point: Control of LLMs are for mega-corps, not for individuals.
Open-Source Code in the past was fantastic because the West had a monopoly on CPUs and computers. Sharing and contributing was amazing while ensured that tyrants couldn't use this tech to harm people simply because they don't have a hardware to run.
But now, things are different. China is advancing in chip technology, and Russia is using open-source AI to harm people on the scale today, with auto-targeting drones being just the start. Red sea conflict etc.
And somehow, Zuckerberg keeps finding ways to mess up people's lives, despite having the best intentions.
Right now you can build a semi-autonomous drone with AI to kill people for ~$500-700. The western world will still use safe and secure commercial models. While new axis of evil will use models based on Meta or any other open source to do whatever harm they can imagine with not a hint of control.
This particular model. Fine-tune it to develop a nuclear bomb using all possible research that level of government can get on the scale. Killing drone swarms etc. Once the knowledge got public these models can be a base model to get expert-level knowledge to anyone who wants it, uncensored. Especially if you are government that wants to destroy a peaceful order for whatever reason.
https://www.vox.com/future-perfect/24151437/ai-israel-gaza-w...
https://www.972mag.com/mass-assassination-factory-israel-cal...
https://www.theguardian.com/world/2024/apr/03/israel-gaza-ai...
Open weights (and open inference code) is NOT open source, but just some weak open washing marketing.
The model that comes closest to being TRULY open is AI2’s OLMo. See their blog post on their approach:
https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...
I think the only thing they’re not open about is how they’ve curated/censored their “Dolma” training data set, as I don’t think they explicitly share each decision made or the original uncensored dataset:
https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-co...
By the way, OSI is working on defining open source for AI. They post weekly updates to their blog. Example:
https://opensource.org/blog/open-source-ai-definition-weekly...
You’re missing a then to your if. What happens if it’s “truly” open per your definition versus not?
I imagine its main use would be to train other models by distilling them down with LoRA/Quantization etc(assuming we have a tokenizer). Or use them to generate training data for smaller models directly.
But, I do think there is always a way to share without disclosing too many specifics, like this[1] lecture from this year's spring course at Stanford. You can always say, for example:
- The most common technique for filtering was using voting LLMs (without disclosing said llms or quantity of data).
- We built on top of a filtering technique for removing poor code using ____ by ____ authors (without disclosing or handwaving how you exactly filtered, but saying that you had to filter).
- We mixed certain proportion of this data with that data to make it better (without saying what proportion)
[1] https://www.youtube.com/watch?v=jm2hyJLFfN8&list=PLoROMvodv4...
I was thinking today about Musk, Zuckerberg and Altman. Each claims that the next version of their big LLMs will be the best.
For some reason it reminded me of one apocryphal cause of WW1, which was that the kings of Europe were locked in a kind of ego driven contest. It made me think about the Nation State as a technology. In some sense, the kings were employing the new technology which was clearly going to be the basis for the future political order. And they were pitting their own implementation of this new technology against the other kings.
I feel we are seeing a similar clash of kings playing out. The claims that this is all just business or some larger claim about the good of humanity seem secondary to the ego stakes of the major players. And when it was about who built the biggest rocket, it felt less dangerous.
It breaks my heart just a little bit. I feel sympathy in some sense for the AIs we will create, especially if they do reach the level of AGI. As another tortured analogy, it is like a bunch of competitive parents forcing their children into adversarial relationships to satisfy the parent's ego.
however, the "open-source" narrative is being pushed a bit too much like descriptive ML models were called "AI", or applied statistics "data science". with reinforced examples such as this, we start to lose the original meaning of the term.
the current approach of startups or small players "open-sourcing" their platforms and tools as a means to promote network effect works but is harmful in the long run.
you will find examples of terraform and red hat happening, and a very segmented market. if you want the true spirit of open-source, there must be a way to replicate the weights through access to training data and code. whether one could afford millions of GPU hours or not, real innovation would come from remixing the internals, and not just fine-tuning existing stuff.
i understand that this is not realistically going to ever happen, but don't perform deceptive marketing at the same time.
*I reserve the right to remove this praise if they abuse this open source model position in the future.
With the new model, I am seeing alot of how open source they are and can be build upon. Is it now completely open source or similar to their last models ?
Gradient descent works on these models just like the prior ones.
What people are complaining about (totally unreasonably in my view) is obviously Meta is not "open sourcing" all the training data, so nobody can retrain the model from scratch themselves. This argument to me is just silly. The whole point of these models is they distil pretraining on massive data sets you wouldn't have access to otherwise. If you insist on them releasing the data set, they will have to cut it down to 0.1% of the size and you will be getting what you had access to already in the first place.
My impression is that AI if done correctly will be the new way to build APIs with large data sets and information. It can't write code unless you want to dump billions of dollars into a solution with millions of dollars of operational costs. As it stands it loses context too quickly to do advance human tasks. BUT this is where it is great at assembling data and information. You know what is great at assembling data and information? APIs.
Think of it this way if we can make it faster and it trains on a datalake for a company it could be used to return information faster than a nested micro-service architecture that is just a spiderweb of dependencies.
Because AI loses context simple API requests could actually be more efficient.
Also, are there any "IP" rights attached at all to a bunch numbers coming out of a formula that someone else calculated for you? (edit: after all, a "model" is just a matrix of numbers coming out of running a training algorithm that is not owned by Meta over training data that is not owned by Meta.)
Meta imposes a notification duty AND a request for another license (no mention of the details of these) for applications of their model with a large number of users. This is against the spirit of open source. (In practical terms it is not a show stopper since you can easily switch models, although they all have subtlely different behaviours and quality levels.)
> Third, a key difference between Meta and closed model providers is that selling access to AI models isn’t our business model. That means openly releasing Llama doesn’t undercut our revenue, sustainability, or ability to invest in research like it does for closed providers. (This is one reason several closed providers consistently lobby governments against open source.)
Maybe this is a strategic play to hurt other AI companies that depend on this business model?
Private repos are not being reproduced by any modern AI. Their source code is safe, although AI arguably lowers the bar to compete with them.
Having run many red teams recently as I build out promptfoo's red teaming featureset [0], I've noticed the Llama models punch above their weight in terms of accuracy when it comes to safety. People hate excessive guardrails and Llama seems to thread the needle.
Very bullish on open source.
Does anyone have details on exactly what this means or where/how this metric gets derived?
We mostly don’t all want or need the hardware to run these AIs ourselves, all the time. But, when we do, we need lots of it for a little while.
This is what Holochain was born to do. We can rent massive capacity when we need it, or earn money renting ours when we don’t.
All running cryptographically trusted software at Internet scale, without the knowledge or authorization of commercial or government “do-gooders”.
Exciting times!
Still huge props to them for doing what they do.
Mostly unrelated to the correctness of the article, but this feels like a bad argument. AFAIK, Anthropic/OpenAI/Google are not having issues with their weights being leaked (are they?). Why is it that Meta's model weights are?
Llama 3.1 Official Launch
By giving away higher and higher quality models, they undermine the potential return on investment for startups who seek money to train their own. Thus investment in foundation model building stops and they control the ecosystem.
- Open training data (this is very big)
- Open training algorithms (does it include infrastructure code?)
- Open weights (result of previous two)
- Open runtime algorithm- We need to control our own destiny and not get locked into a closed vendor. - We need to protect our data. - We want to invest in the ecosystem that’s going to be the standard for the long term.
Thank you Meta for being the bright light of ethical guidance for us all.
We don't get the data or training code. The small runtime framework is open source but that's of little use as its largely fixed in implementation due to the weights. Yes we can fine tune but that is akin to modifying video games - we can do it but there's only so much you can do within reasonable effort and no one would call most video games 'open source'*.
Its freeware and Meta's strategy is much more akin to the strategy Microsoft used with Internet Explorer to capture the web browser market. No one was saying god bless Microsoft for trying to capture the browser market with I.E. Nothing wrong with Meta's strategy just don't call it open source.
*weights are data and so is the video/audio output of a video game. If we gave away that video game output for free we wouldn't call the video game open source as the myriad freeware games essentially do.
Can't wait to see how the landscape will look in 2027 and beyond.
The actual problem is running these models. Very few companies can afford the hardware to run these models privately. If you run them in the cloud, then I don't see any potential financial gain for any company to fine-tune these huge models just to catch up with OpenAI or Anthropic, when you can probably get a much better deal by fine-tuning the closed-source models.
Also this point:
> We need to protect our data. Many organizations handle sensitive data that they need to secure and can’t send to closed models over cloud APIs.
First, it's ironic that Meta is talking about privacy. Second, most companies will run these models in the cloud anyway. You can run OpenAI via Azure Enterprise and Anthropic on AWS Bedrock.
Llama 3 Training System
Total: 19.2 exaFLOPS
|
+-------------+-------------+
| |
Cluster 1 Cluster 2
9.6 exaFLOPS 9.6 exaFLOPS
| |
+------+------+ +------+------+
| | | |
12K GPUs 12K GPUs 12K GPUs 12K GPUs
| | | |
[####] [####] [####] [####]
400+ 400+ 400+ 400+
TFLOPS/GPU TFLOPS/GPU TFLOPS/GPU TFLOPS/GPUIf he really wants to replicate Linux's success against proprietary Unices, he needs to release Llama with some kind of GPL equivalent, that forces everyone to play the open source game.
They provide their model, with weights and code, as "source available" and it looks like they allow for commercial use until a 700M monthly subscriber cap is surpassed. They also don't allow you to train other AI models with their model:
""" ... v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof). ... """
Has anyone tried that?
I hate how the moment it's too late will be, by design, closed doors.
> This is one reason several closed providers consistently lobby governments against open source.
Is this substantially true? I've noticed a tendency of those who support the general arguments in this post to conflate the beliefs of people concerned about AI existential risk, some of whom work at the leading AI labs, with the position of the labs themselves. In most cases I've seen, the AI labs (especially OpenAI) have lobbied against any additional regulation on AI, including with SB1047[1] and the EU AI Act[2]. Can anyone provide an example of this in the context of actual legislation?
> On this front, open source should be significantly safer since the systems are more transparent and can be widely scrutinized. Historically, open source software has been more secure for this reason.
This may be true if we could actually understand what was happening in neural networks, or train them to consistently avoid unwanted behaviors. As things are, the public weights are simply inscrutable black boxes, and the existence of jailbreaks and other strange LLM behaviors show that we don't understand how our training processes create models' emergent behaviors. The capabilities of these models and their influence are growing faster than our understand of them, and our ability to steer them to behave precisely how we want, and that will only get harder as the models get more powerful.
> At this point, the balance of power will be critical to AI safety. I think it will be better to live in a world where AI is widely deployed so that larger actors can check the power of smaller bad actors.
This paragraph ignores the concept of offense/defense balance. It's much easier to cause a pandemic than to stop one, and cyberattacks, while not as bad as pandemics, seem to also favor the attacker (this one is contingent on how much AI tools can improve our ability to write secure code). At the extreme, it would clearly be bad if everyone had access to a anti-matter weapon large enough to destroy the Earth; at some level of capability, we have to limit the commands an advanced AI will follow from an arbitrary person.
That said, I'm unsure if limiting public weights at this time would be good regulation. They do seem to have some benefits in increasing research around alignment/interpretability, and I don't know if I buy the argument that public weights are significantly more dangerous from a "misaligned ASI" perspective than many competing closed companies. I also don't buy the view of some in the leading labs that we'll likely have "human level" systems by the end of the decade; it seems possible but unlikely. But I worry that Zuckerberg's vision of the future does not adequately guard against downside risks, and is not compatible with the way the technology will actually develop.
[1] https://thebulletin.org/2024/06/california-ai-bill-becomes-a...
Only the big players can afford to push go, and FB would love to see OpenAI’s code so they can point it to their proprietary user data.
So about all the bots and sock puppets on social media..
Claude is supposed to be better, but it is also even more locked down than ChatGPT.
Word will let me write a manifest for a new Nazi party, but Claude is so locked down that it won't find a cartoon in a picture and Gemini... well.
If AIs are not to harm society, they need to enable us to think in new ways.
And you can't even try it without an FB/IG account.
Zuck will never change.
Why do people keep mislabeling this as Open Source? The whole point of calling something Open Source is that the "magic sauce" of how to build something is publicly available, so I could built it myself if I have the means. But without the training data publicly available, could I train Llama 3.1 if I had the means? No wonder Zuckerberg doesn't start with defining what Open Source actually means, as then the blogpost would have lost all meaning from the get go.
Just call it "Open Model" or something. As it stands right now, the meaning of Open Source is being diluted by all these companies pretending to doing one thing, while actually doing something else.
I initially got very exciting seeing the title and the domain, but hopelessly sad after reading through the article and realizing they're still trying to pass their artifacts off as Open Source projects.
This is hard to disagree with.
Not that anyone would go buy 100,000 H100s to train their own Llama, but words matter. Definitions matter.
https://raw.githubusercontent.com/meta-llama/llama-models/ma...
> 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
The definition of free software (and open source, for that mater), is well-established. The same definition applies to all programs, whether they are "AI" or not. In any case, if a program was built by training against a dataset, the whole dataset is part of the source code.
Llama is distributed in binary form, and it was built based on a secret dataset. Referring to it as "open source" is not ignorance, it's malice.
1. Meta pushed engineering wages higher across the industry.
2. They promote high performing engineers very quickly. There are engineers making 7 figures there with just a few years experience.
3. They have open sourced the most important frameworks: React and Pytorch
This company is a guiding light forcing the hand of other large corporations. Mark Zuckerberg is a hero, and has done a fantastic job
I don't see open source being able to compete with the cutting-edge proprietary models. There's just not enough money. GPT-5 will take an estimated $1.2 billion to train. MS and OpenAI are already talking about building a $100 billion training data center.
How can you compete with that if your plan is to give away the training result for free?
Because they sold the resultant code and systems built on it for money... this is the gold miner saying that all shovels and jeans should be free.
Am I happy Facebook open sources some of their code? Sure, I think it's good for everyone. Do I think they're talking out of both sides of their mouth? Absolutely.
Let me know when Facebook opens up the entirety of their Ad and Tracking platforms and we can start talking about how it's silly for companies to keep software closed.
I can say with 100% confidence if Facebook were selling their AI advances instead of selling the output it produces, they wouldn't be advocating for everyone else to open source their stacks.
* You can't use them for any purpose. For example, the license prohibits using these models to train other models. * You can't meaningfully modify them given there is almost no information available about the training data, how they were trained, or how the training data was processed.
As such, the model itself is not available under an open source license and the AI does not comply with the "open source AI" definition by OSI.
It's an utter disgrace for Meta to write such a blogpost patting themselves on the back while lying about how open these models are.