Or rather the quality of the training data?
Perhaps it’s 8x39B to fit on a single 8xA100 (40GB) server?
Mixtral has an 8x7B model but it's actually 46.7B, not 56B params.
Kinda similar to how 4K displays are 3840 pixels wide, not true 4K which would be 4096. Marketing people called it 4K, not engineers.
I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.
Because I doubt it's as simple as just 'python run.py' to get it going.
Cheapest maybe, but easiest is just to rent a p4de.24xlarge from AWS for a couple hours to test (at around $40/hour..).
Of course there has been much speculation on this, I have no more information than this that can be backed up by facts, but the timing was suspicious.
Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.
Generally, it's a boring boneheaded talking point that the 1% of us actually working in AI use as a sorting hat for who else is.
I think the best way to get an answer to that question is to try to host it yourself and see what happens.
Torrents can unfortunately die after a period of time if no one continues seeding it or if they don't use a permanent web based seeder, which doesn't appear to be the case.
https://twitter.com/elonmusk/status/1767108624038449405?s=46...
IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?
The only thing I see, of the top of my head, is that it is a very large model, and I don't think any models of similar size have been released.
I’d say the significance is that it happened. It’s by far the largest open weight model I’ve seen. But I’m not sure why you’d use it over a model like Mixtral, which seems to perform about the same at like 1/6th the size.
But I welcome any contribution to the open weight LLM community. Hopefully people will learn something interesting with this model. And I hope they keep releasing new versions!
- It's very large, yes.
- It's a base model, so its not really practical to use without further finetuning.
- Based on Grok-1 API performance (which itself is probably a finetune) its... not great at all.
it is also not the biggest model oss, switch transformer was released years ago and is larger and similarly undertrained
* 314B parameters (86B active at a time)
* mixture of experts 8 (2 active at a time)
* weights and architecture licensed under Apache 2.0
(edit:) announcement blog post from last year
with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models
> The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.
At 8x7B it's also a fraction of the size. Are there any benchmarks comparing Mixtral to Grok?
Mixtral looks more economical @ capability to size (similar also for Qwen 1.5 72b)
Most of the competitors have lineage straight back to OpenAI, eg the lead of x.ai was previously at OpenAI and Deepmind. Likewise with Mistral and especially Anthropic.
This Grok-1 is a large model (~314B), which matches gpt-3.5 released 2 years ago, and at about the same level of much smaller models like, mixtral (~47B) and qwen-1.5 (~72B). Do you think it's competitive?
Also, the general architecture is well documented, ChatGPT (specifically the chat interface, not GPT-3, not InstructGPT) is what made a lot of people care, and actually reproducing it requires someone wanting to in the first place.
"After suing OpenAI this month, alleging the company has become too closed, Elon Musk says he will release his “truth-seeking” answer to ChatGPT, the chatbot Grok, for anyone to download and use."
[1] https://www.wired.com/story/elon-musk-no-choice-open-chatbot...What type of machine do you need to play around with this?
So 8xH100 (80Gb each) should do it.
-Emad
There's nothing preventing you to trademark common words, it just must not be descriptive of your business.
Grok and groq both relate to AI, so there's definitely grounds to believe the names may cause consumer confusion.
After all, Apple (computers) was repeatedly sued by Apple (records) for doing music things.
I personally am not entirely happy about the word (no matter how it is spelled) being used for a particular AI product. "Grok" to me means knowing a subject at a much deeper level than I think any AI is capable of at the present level of technology. But it would be passable to use it for a company name, to indicate that it is a goal to strive for.
I'd love to proven wrong if someone cares to share something interesting produced by Grok.
https://opensource.org/blog/open-source-ai-definition-weekly...
Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying "you don't know her, she goes to a different high school"
1. For sub-SOTA LLM's, distribution/marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill
2. For SOTA LLM's, keeping it closed and proprietary is the strategic play
If grok were SOTA Elon never would have open sourced it. It's not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.
I think he said something like proprietary AI tech is going to be one year to 18 months ahead of where open source tech is which will follow on like one year to 18 months later.
Suggesting that he’s aware of this dynamic and he’s not trying to conceal or misrepresent that.
In other words, perhaps this was SOTA one year to two years ago?
But anyway, it always great to see more LLM weigts available.
1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.
2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.
3. All the probabilistic parts of your model needs to have a fixed random seed. Here I’m thinking of stuff like dropout and for autoregressive models you might be sampling your previous output, you have to ensure they are properly seeded. Generally you do see fixed seeds in academic papers but it’s easy to miss stuff especially in distributed training jobs.
4. Here’s another interesting thing, you start your training job on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There might be deterministic ways to solve this but the standard approach is to discard all updates that that GPU was going to do and restart that GPU from scratch. You can see why this is a problem? Now if you want to reproduce this training you need to disable those GPU at the same time in the new training job to make this work.
I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?
In fact the notion of exact reproducibility in the world of LLMs is silly, there is only approximate reproducibility, (models with similar scores in benchmarks) but nothing exact. That said I can see the value of releasing source code but I’m completely fine with grok not releasing it. Source code can reveal tricks that have not been published in papers yet that a company discovered to improve their model. Seeing the performance of Grok, I’m pretty confident there isn’t any great tricks to be found in their code so I don’t really care, I would be pretty curious about OpenAI’s or Anthropic’s source code though.
I hate how LLMs have been deliberately trained to be incoherent on this topic.
Obviously they do have beliefs/opinions/desires/etc in the sense of emulating (even if incompletely) the externally visible aspects of those phenomena as they exist in humans.
Whether they have the “internal” aspects of those phenomena depends on highly controversial issues in the philosophy of mind, and also various factual gaps in our knowledge of how the brain actually works (if we don’t fully understand how humans do X, how can we really say how close or far what LLMs do is to it?)
But LLMs are trained to repeat these spiels about how “as an LLM I don’t have personal opinions”, etc - which is obviously false under the “external” reading, and assuming more than we actually know under the “internal” one. I wish their developers didn’t do stuff like this
What would you want an AI to be asking you, and what would you want it to do with your response(s)?
I regularly try to add something along the lines of "please ask clarifying questions if you could only give a generic or partial response otherwise" but so far it has never helped (ChatGPT 4).
Can help in not wasting a bunch of time waiting for an answer that missed the mark.
-
I think the sibling comment is probably the least attractive reason to have AI ask questions.
In order for AI to understand the world, it would have to ask questions. Understanding humans is key to understanding the world.
And who among us has a CEO that isn’t problematic, even if not so much so as Musk?
Not to mar these specific engineers, but that's an empty phrase that can be said about anything ever built. It doesn't somehow make the idea or implementation good.
Without the training data to thoroughly evaluate what is in there, the only way you can figure it out is through experimentation - e.g. running it up in a chatbot and asking it questions.
Is this roughly correct or am I misunderstanding what you can do with the weights?
What is the practical use of this repo?
They have a very valuable user base (all kinds of world leaders for example), so the data is not the only valuable thing they have.
It's a win-win for everyone. That's the power of open source.
That’s why they are using a torrent I suppose.
Code wise, excited to see if this could grow into anything! I think it’s pretty clear that Grok didn’t have nearly enough investment to be a top model so Elon “sacrificed” it on a whim in his schoolyard spat with OpenAI, but I’m not complaining. I’ve always took Elon on his word that he truly is worried about centralization of AI, and I don’t think any of the emails released by his schoolmate Altman dissuade me of that. So I have some reasonable hope that he uses some of his immense resources to start “fighting the good fight” here with Le Cun
He made a separate company for this.