Grok (opens in new tab)

(github.com)

1170 pointspierre2y ago419 comments

419 comments

At 8x86B, looks like the largest open model yet by far. Would be interesting to hear how many tokens it's been trained on. Especially important for higher param models in order to efficiently utilize all those parameters.

swalsh2y ago

Considering how poor it is compared to other models, it really emphasises how important fine tuning is. Models with MUCH smaller parameter counts are outperforming it in many metrics.

lukan2y ago

"it really emphasises how important fine tuning is"

Or rather the quality of the training data?

4 more replies

lairv2y ago

I would say it emphasises that training a good model is more than throwing random data and compute

gordian-mind2y ago

Current metrics are a poor way to measure the usefulness of LLMs.

make32y ago

no it empathizes the importance of training smaller models for longer, like the Mistral "overtrained" models

gdiamos2y ago

Show the proof? Does it include IFT?

zone4112y ago

It's actually not the largest. https://huggingface.co/google/switch-c-2048 is 1.6T parameters.

WeMoveOn2y ago

but is switch c even usable? iirc the training set was nowhere near enough for a model of that size to be coherent in a conversation

p1esk2y ago

It’s not 8x86B. Total number of parameters is 314B.

Perhaps it’s 8x39B to fit on a single 8xA100 (40GB) server?

dheera2y ago

They all do this marketing bull.

Mixtral has an 8x7B model but it's actually 46.7B, not 56B params.

Kinda similar to how 4K displays are 3840 pixels wide, not true 4K which would be 4096. Marketing people called it 4K, not engineers.

1 more reply

cma2y ago

Active parameters is 86B, so wouldn't that be the size of the largest two experts (where they may all be the same) + the weights of the selector?

moffkalast2y ago

Most likely it's a MoE of Grok-0 which would be 8x33B + 50B for the router.

ilaksh2y ago

Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?

I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.

Because I doubt it's as simple as just 'python run.py' to get it going.

zone4112y ago

If you're just looking to test it out, it's probably easiest to wait for llama.cpp to add support (https://github.com/ggerganov/llama.cpp/issues/6120), and then you can run it slowly if you have enough RAM, or wait for one of the inference API providers like together.ai to add it. I'd like to add it to my NYT Connections benchmarks, and that's my plan (though it will require changing the prompt since it's a base model, not a chat/instruct model).

logicchains2y ago

>it's probably easiest

Cheapest maybe, but easiest is just to rent a p4de.24xlarge from AWS for a couple hours to test (at around $40/hour..).

1 more reply

v9v2y ago

The NYT Connections benchmark sounds interesting, are the results available online?

1 more reply

a_wild_dandan2y ago

Someone could run Grok-1 on a 192GB M2 Mac when a 4-bit quant is released; I'm guessing that TheBloke is already working on it.

mohu2y ago

Fairly sure the bloke hasn't created any new quants in a month.

hanselot2y ago

TheBloke dissapeared near the day https://nvd.nist.gov/vuln/detail/CVE-2024-23496 was published.

Of course there has been much speculation on this, I have no more information than this that can be backed up by facts, but the timing was suspicious.

4 more replies

htrp2y ago

Still waiting on this one. Anyone find someone on twitter who can run it?

simonw2y ago

"Base model trained on a large amount of text data, not fine-tuned for any particular task."

Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.

nasir2y ago

I'd be very curious to see how it performs especially on inputs that's blocked by other models. Seems like Grok will differentiate itself from other OS models from a cencorship and alignment perspective.

porkbeer2y ago

So far that is quite a low bar. But balancing is a thing nontheless, lest we end up with Tay again.

nylonstrung2y ago

For what reason would you want to use this instead of open source alternatives like Mistral

rvnx2y ago

Mistral opened their weights only for very small LLaMA-like model.

MallocVoidstar2y ago

I'm pretty sure Mixtral outperforms Grok-1 and uses much less memory to do it

2 more replies

verticalscaler2y ago

Well if nothing else, this one might be significantly less nerfed. Very interesting to compare to the others.

refulgentis2y ago

It's not, and I mean it, specifically in groks case.

Generally, it's a boring boneheaded talking point that the 1% of us actually working in AI use as a sorting hat for who else is.

8 more replies

zozbot2342y ago

Isn't this Apache licensed? Regardless, you can run multiple models concurrently on the same input using well-known ensemble techniques. (Not to be confused with mixture-of-experts, which is more like training a single model where only a few blocks are chosen to be active at any given time - a kind of sparsity.)

tlb2y ago

Not super easy if they have different tokenizers.

pogue2y ago

Can someone explain why the weights are posted via a Bittorrent magnet link? I have no way to check the size at the moment, but isn't that a bit unusual? There's also only 21 seeders right now according to https://checker.openwebtorrent.com/

monkin2y ago

It's 318.24G

https://academictorrents.com/details/5f96d43576e3d386c9ba65b...

whywhywhywhy2y ago

Because Bittorrent is an outstanding tech for delivering large files, more I think about it the more I'm surprised it wasn't taken advantage of more.

Marlinski2y ago

it's been criminalized to hell by IP holders and hollywood. Such a shame they killed the best tech of the previous decade. Could have revolutionized how we distribute content, approach CDN and even streaming.

2 more replies

fzzzy2y ago

It may become a tradition since weights are so large. Perhaps it started when the Llama torrent link leaked. Then, Mistral decided to release their weights using bittorrent.

bongodongobob2y ago

I'm not sure why you wouldn't tbh. That's a lot of bandwidth.

1 more reply

MallocVoidstar2y ago

Distributing 300GB via torrent is cheaper than direct, assuming even a few other people seed

ur-whale2y ago

> Can someone explain why the weights are posted via a Bittorrent magnet link?

I think the best way to get an answer to that question is to try to host it yourself and see what happens.

raydev2y ago

Spreads the burden/cost of distributing a 300+GB file.

CamperBob22y ago

How else could/should it be done?

pogue2y ago

I would have assumed they could just upload it to Github. If it has restrictions on file size I'm sure they could make multiple part compressed files.

Torrents can unfortunately die after a period of time if no one continues seeding it or if they don't use a permanent web based seeder, which doesn't appear to be the case.

6 more replies

seydor2y ago

my optimistic explanation is we are going back to the 2000s internet , but probably we are not

fzzzy2y ago

Let's hope so.

pooloo2y ago

Its likely over 100GB of data, so I wouldn't say its necessarily unusual to spread out the bandwidth across multiple hosts.

pogue2y ago

Thanks! I searched and searched for a tool that would show me info via the web about a magnet link but nada

lambdaba2y ago

Why not? Mistral was first to do it, it has become tradition.

orlp2y ago

BitTorrent is just an objectively superior method of delivering a lot of data to a lot of people.

gillesjacobs2y ago

I believe it was Llama 1 that notoriously got leaked with a torrent on 4chan.

1 more reply

leumon2y ago

Mistral did it too when they released their first open model. They just posted a magnet link on Twitter.

frankjr2y ago

I don't understand why you're being downvoted for asking a legitimate question. People not familiar with model weights might be surprised that they are often in tens of gigabytes and in this case even more.

joydeep3142y ago

Model weights on huggingface: https://huggingface.co/xai-org/grok-1

cl3misch2y ago

Love the minimal repo, magnet link, and stating "open weights" instead of "open source". Refreshing!

TheDudeMan2y ago

Elon says open source:

https://twitter.com/elonmusk/status/1767108624038449405?s=46...

stale20022y ago

Hey, asking any experts here, what are their first thoughts in the significance of this?

IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?

The only thing I see, of the top of my head, is that it is a very large model, and I don't think any models of similar size have been released.

Me10002y ago

Not an expert by any means, but I like learning about this stuff and I play with a lot of open weight models.

I’d say the significance is that it happened. It’s by far the largest open weight model I’ve seen. But I’m not sure why you’d use it over a model like Mixtral, which seems to perform about the same at like 1/6th the size.

But I welcome any contribution to the open weight LLM community. Hopefully people will learn something interesting with this model. And I hope they keep releasing new versions!

MichaelRazum2y ago

If I may ask, how do you load such big models? 300gb seems like a lot to play around with.

1 more reply

brucethemoose22y ago

Tests are not out yet, but:

- It's very large, yes.

- It's a base model, so its not really practical to use without further finetuning.

- Based on Grok-1 API performance (which itself is probably a finetune) its... not great at all.

whimsicalism2y ago

seems like a large undertrained model, not that exciting imo compared to mixtral

it is also not the biggest model oss, switch transformer was released years ago and is larger and similarly undertrained

modeless2y ago

Is this the first major model to be natively FP8? I was wondering why people hadn't done it yet. Seems like a big win when hardware supports it.

a_wild_dandan2y ago

No, e.g. Yi-34B.

modeless2y ago

As far as I can tell Yi-34B is natively 16 bit float, the 8 bit version is quantized. https://huggingface.co/01-ai/Yi-34B#quantization

tosh2y ago

blog post: https://x.ai/blog/grok-os

  * 314B parameters (86B active at a time)
  * mixture of experts 8 (2 active at a time)
  * weights and architecture licensed under Apache 2.0

(edit:) announcement blog post from last year with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok

(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models

OkGoDoIt2y ago

Is a model so huge that’s only at the level of GPT 3.5 actually good? That seems incredibly inefficient to me.

fwlr2y ago

OpenAI is valued at 90 billion and all they do is make GPT; Twitter is valued at 40 billion and this was essentially a vanity side-project by a cowboy CEO. Presuming that benchmarks and general “it’s about the level of 3.5” is accurate, it’s inefficient, but not incredibly inefficient imho

2 more replies

cma2y ago

Since it is MoE, quantized it could be able to run on cheaper hardware with just consumer networking inbetween instead of needing epyc/xeon levels of PCI-e lanes, nvlink, or infiniband type networking. Or it could even run with people pooling smaller systems over slow internet links.

drak0n1c2y ago

It’s designed to be actively searching real-time posts on X. Apples and oranges.

3 more replies

xcv1232y ago

According to their benchmarks it is superior to GPT-3.5

asciii2y ago

I love the citation for image in the article

> The cover image was generated using Midjourney based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.

TOMDM2y ago

Mixtral is also comparable to gpt 3.5 and open.

At 8x7B it's also a fraction of the size. Are there any benchmarks comparing Mixtral to Grok?

tosh2y ago

Mixtral announcement is here: https://mistral.ai/news/mixtral-of-experts/

Mixtral looks more economical @ capability to size (similar also for Qwen 1.5 72b)

tootie2y ago

How is it that OpenAI was touted like it was some massive years-long effort that blew all AI research out of the water and now we have so many competitors popping up one after another?

longdog2y ago

You don't need to be a cutting edge research scientist to train a SOTA LLM. You just need money for scaling. OpenAI's "secret" was just their willingness to spend tens/hundreds of millions without guaranteed returns, and RLHF/instruct fine tuning, both of which are out of the bag now.

1 more reply

cavisne2y ago

LLM training is arcane and expensive to experiment with. So OpenAI had to waste a lot of time and GPU-hours on things that didn't work to learn the tricks that did work.

Most of the competitors have lineage straight back to OpenAI, eg the lead of x.ai was previously at OpenAI and Deepmind. Likewise with Mistral and especially Anthropic.

jxy2y ago

OpenAI still seems to be at the top, except for Anthropic, who may be close, in terms of the capabilities comparing gpt-4 and claude-opus.

This Grok-1 is a large model (~314B), which matches gpt-3.5 released 2 years ago, and at about the same level of much smaller models like, mixtral (~47B) and qwen-1.5 (~72B). Do you think it's competitive?

ben_w2y ago

Egg of Columbus.

Also, the general architecture is well documented, ChatGPT (specifically the chat interface, not GPT-3, not InstructGPT) is what made a lot of people care, and actually reproducing it requires someone wanting to in the first place.

shantnutiwari2y ago

Those of us who dont spend all our time in LLMs-- whats this about? Whats the big deal and why is it on the front page at #1?

kayge2y ago

I think this paragraph from an earlier Wired article [1] sums it up pretty well:

  "After suing OpenAI this month, alleging the company has become too closed, Elon Musk says he will release his “truth-seeking” answer to ChatGPT, the chatbot Grok, for anyone to download and use."

[1] https://www.wired.com/story/elon-musk-no-choice-open-chatbot...

solumunus2y ago

The deeper reason is that he’s throwing his toys out of the pram after failing to convince OpenAI to become part of Tesla.

moralestapia2y ago

Well, he delivered.

paxys2y ago

Partially. Open weights is not open source.

gfodor2y ago

In machine learning models the term open source has been largely accepted to mean sharing weights and, if necessary, inference code. You can argue if this is an abuse of the term but everyone does it, and saying someone didn’t deliver if they used it and published weights would probably mean saying the same about mistral, meta, etc.

1 more reply

xcv1232y ago

The architecture of the model is open source. Not just the weights. You can run the entire thing locally.

gardenhedge2y ago

> Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code

What type of machine do you need to play around with this?

3170702y ago

Probably a machine with about 628 GB of GPU memory. (2 bytes per parameter)

So 8xH100 (80Gb each) should do it.

Marlinski2y ago

I suppose it can be quantizised

anigbrowl2y ago

'Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being run 8 bit on 8 x 80 Gb GPUs.'

-Emad

a_wild_dandan2y ago

A single 192GB M2 Mac using a 4-bit quant would work.

simonw2y ago

Is there a model card anywhere? I'd like to know what it was trained on.

hubraumhugo2y ago

When will we reach an upper limit/dimishing returns in terms of number of parameters and mixture of experts?

andy992y ago

We may have already - data is more important than anything else which is why nobody has beat GPT4 yet. Throwing more parameters or more compute at the problem only gets you so far. But Grok was never a contender so there is room to improve on it. It is one of the biggest models open sourced as mentioned, so will be interesting to take a look at for sure.

lambdaba2y ago

Claude 3 has *decisively* beat GPT-4, I wonder how all their attributes compare.

4 more replies

YetAnotherNick2y ago

There is no reason to believe GPT-4 had more(or higher quality) data than Google etc. has now. GPT-4 was entirely trained before the Microsoft deal. If OpenAI could pay to acquire data in 2023, >10 companies could acquire similar quality by now, and no one has similar quality model in a year.

1 more reply

squigz2y ago

I think Groq is something else?

2 more replies

ldjkfkdsjnv2y ago

Claude > GPT4. Anyone using these models on a daily basis knows this

2 more replies

littlestymaar2y ago

How long before the Groq team sues for trademark violation? It's literally the purpose of trademark laws to make sure resembling names do not cause confusion in the mind of customers so it would be very surprising to see this situation persist.

EastSmith2y ago

There is a friendly warning here from Groq: https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/

bhaney2y ago

Is it safe to say, 4 months later, that Elon is ignoring this? I assume there hasn't been any kind of response or further action taken yet.

mlindner2y ago

Grok is a word in common parlance. So there's no way they could succeed in any suit. That's why the Groq team picked a modification of the word.

littlestymaar2y ago

You mean like Canvas®, Apple®, Windows® or Amazon®? Wanna try re-use these for your own business and see how it goes?

There's nothing preventing you to trademark common words, it just must not be descriptive of your business.

nostrebored2y ago

Would be a rough trademark enforcement case as “Grok” has been in common language for decades

ben_w2y ago

So has "Apple" and "Windows".

Grok and groq both relate to AI, so there's definitely grounds to believe the names may cause consumer confusion.

After all, Apple (computers) was repeatedly sued by Apple (records) for doing music things.

1 more reply

Findecanor2y ago

I myself have never heard it outside of "nerdy" circles... that is: people who would read science fiction.

I personally am not entirely happy about the word (no matter how it is spelled) being used for a particular AI product. "Grok" to me means knowing a subject at a much deeper level than I think any AI is capable of at the present level of technology. But it would be passable to use it for a company name, to indicate that it is a goal to strive for.

1 more reply

Angostura2y ago

Robert A. Heinlein coined the term grok in 1961

1 more reply

cavisne2y ago

They already have.

aussieguy12342y ago

How hard would it be for an open source group to fine tune this into a chatbot?

ArunRaja2y ago

Is this grok open sourcing really a big deal? How is this move beneficial for grok per se? Does it build trust as in other opensource products..?

LZ_Khan2y ago

How are people's experience with this model? Having the most weights is one thing but being a better model than the 70B models is another.

swalsh2y ago

I use grok all the time to find tweets or ask about trends on Twitter. For that it's better than what used to exist. But its not a great model outside that narrow use case.

labrador2y ago

tbh, I've never seen anyone share anything interesting produced by Grok. I see plenty of posts on X and reddit of people sharing amazing things that GPT-4 and now Claude 3 Opus can do. Grok can roast people. That's pretty much all I've seen.

I'd love to proven wrong if someone cares to share something interesting produced by Grok.

andre-z2y ago

The only other Repository is a fork of Qdrant.

sqreept2y ago

What are the languages supported by it?

cyanydeez2y ago

Tweets.

rvnx2y ago

One subtle thing: Musk said "open-source", we got "open-weights" instead (still better than nothing though, so it's greatly appreciated).

tylerekahn2y ago

This is the weights and the model under Apache 2.0 license. What do you mean by open-source?

https://github.com/xai-org/grok/blob/main/model.py

https://github.com/xai-org/grok/blob/main/run.py#L25

pclmulqdq2y ago

Still better than most of the "open weights" models that have massively restrictive terms.

solarkraft2y ago

He also called permissively licensing Tesla's patents "open sourcing" them. He's at the forefront of misusing the term.

drexlspivey2y ago

The “source” in “open source” refers to source code which they released. A dataset is not source code, if anyone is misusing the term it’s you.

3 more replies

paulgb2y ago

Dumb question: what should open-source mean in the context of something like this? Open access to the training data and training pipeline as well?

CharlesW2y ago

It's not a dumb question, and the answer is "yes".

4 more replies

TaylorAlexander2y ago

The Open Source Initiative is actively working on this over the course of this year, and your input will help define that meaning! Please see here for more:

https://opensource.org/blog/open-source-ai-definition-weekly...

Q6T46nT668w6i3m2y ago

Yes, training and evaluation code, i.e., the code used to generate the weights.

TaylorAlexander2y ago

Yeah musk said “all design and engineering for the original roadster is now open source” and actually what we got was a few PCB files and zero mechanical design files so I don’t ever trust what he says.

captcanuk2y ago

"The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model."

Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying "you don't know her, she goes to a different high school"

gfodor2y ago

Always love it when someone gives away a gift and it’s not enough for people.

captcanuk2y ago

Not just someone but the CEO of the company. He used HIS platform to say "This week, @xAI will open source Grok" (https://twitter.com/elonmusk/status/1767108624038449405) and they aren't doing that. What they delivered specifically says "We are releasing the base model weights and network architecture of Grok-1, our large language model."

1 more reply

atleastoptimal2y ago

I think everyone should realize the following realities of the LLM market

1. For sub-SOTA LLM's, distribution/marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill

2. For SOTA LLM's, keeping it closed and proprietary is the strategic play

If grok were SOTA Elon never would have open sourced it. It's not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.

keepamovin2y ago

I recall Elon saying something like this in an interview so I think it’s less of a deceptive take then perhaps your comment suggest.

I think he said something like proprietary AI tech is going to be one year to 18 months ahead of where open source tech is which will follow on like one year to 18 months later.

Suggesting that he’s aware of this dynamic and he’s not trying to conceal or misrepresent that.

In other words, perhaps this was SOTA one year to two years ago?

atleastoptimal2y ago

Which is correct. The point I'm going for is not against Elon but against his obedient fans and knee-jerk OpenAI haters who claim that they should, by natural obligation, do the "right thing" and open source all their models, and Elon open sourcing grok is him "leading by example" and being the hero that OpenAI can't.

1 more reply

mlindner2y ago

If it's better than any other open source LLM does that even matter? (I say "if" because I don't know.)

redskyluan2y ago

This seems not be a repo ready to open source. You only get weights, very less information about how the weights is trained and finetuned.

But anyway, it always great to see more LLM weigts available.

andrewstuart22y ago

I would argue that there's no bar for open sourcing aside from "do you have the rights to do so." Some source or some public good is certainly better than none, and when the bar is low then you remove barriers to getting started, vs waiting until you have the time someday to "do it right."

rezonant2y ago

Well what constitutes an "open source" model is still controversial and debatable-- lots of people on both sides of that argument.

asadotzler2y ago

Open source has had a useful agreed upon meaning for over 25 years. Maybe you're too young to understand why that matters but we're not.

1 more reply

sashank_15092y ago

In all the debate about open source I don’t think people realize, this model is most likely not reproducible ever again even given the code. Here’s what you need to reproduce the model:

1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.

2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.

3. All the probabilistic parts of your model needs to have a fixed random seed. Here I’m thinking of stuff like dropout and for autoregressive models you might be sampling your previous output, you have to ensure they are properly seeded. Generally you do see fixed seeds in academic papers but it’s easy to miss stuff especially in distributed training jobs.

4. Here’s another interesting thing, you start your training job on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There might be deterministic ways to solve this but the standard approach is to discard all updates that that GPU was going to do and restart that GPU from scratch. You can see why this is a problem? Now if you want to reproduce this training you need to disable those GPU at the same time in the new training job to make this work.

I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?

In fact the notion of exact reproducibility in the world of LLMs is silly, there is only approximate reproducibility, (models with similar scores in benchmarks) but nothing exact. That said I can see the value of releasing source code but I’m completely fine with grok not releasing it. Source code can reveal tricks that have not been published in papers yet that a company discovered to improve their model. Seeing the performance of Grok, I’m pretty confident there isn’t any great tricks to be found in their code so I don’t really care, I would be pretty curious about OpenAI’s or Anthropic’s source code though.

Grimblewald2y ago

Which is why I don't buy into the LLMs don't have personal opinions schtick. Each LLM by virtue of the factors you've mentioned will have its own unique 'perspective', if you will, on a variety of topics. I think it's more correct to say everything a LLM says is it's personal opinion rather than it being some objective truth or something.

skissane2y ago

> Which is why I don't buy into the LLMs don't have personal opinions schtick

I hate how LLMs have been deliberately trained to be incoherent on this topic.

Obviously they do have beliefs/opinions/desires/etc in the sense of emulating (even if incompletely) the externally visible aspects of those phenomena as they exist in humans.

Whether they have the “internal” aspects of those phenomena depends on highly controversial issues in the philosophy of mind, and also various factual gaps in our knowledge of how the brain actually works (if we don’t fully understand how humans do X, how can we really say how close or far what LLMs do is to it?)

But LLMs are trained to repeat these spiels about how “as an LLM I don’t have personal opinions”, etc - which is obviously false under the “external” reading, and assuming more than we actually know under the “internal” one. I wish their developers didn’t do stuff like this

1 more reply

seccode2y ago

It would be cool if these models had conversations with us where they ask questions. I think the future of AI is models that ask questions. There is so much data to be gained by doing this.

crowcroft2y ago

Ok im curious, but I don’t quite understand.

What would you want an AI to be asking you, and what would you want it to do with your response(s)?

lars_francke2y ago

Clarifying questions if the initial prompt was unclear. I'd love it.

I regularly try to add something along the lines of "please ask clarifying questions if you could only give a generic or partial response otherwise" but so far it has never helped (ChatGPT 4).

1 more reply

BoorishBears2y ago

I ask AI to produce clarifying questions then answer them.

Can help in not wasting a bunch of time waiting for an answer that missed the mark.

I think the sibling comment is probably the least attractive reason to have AI ask questions.

1 more reply

seccode2y ago

I get advertisements all the time for conditions that I do not have, and that none of my family members have. If you had a model that asked questions, it could learn my medical history and could direct better ads to me.

In order for AI to understand the world, it would have to ask questions. Understanding humans is key to understanding the world.

globular-toast2y ago

Learn from them.

geor9e2y ago

Explore this idea more - it's easily implemented in a minute or two via the system prompt. API accounts are free to start and you can use the playground/workbench view, like this: https://imgur.com/h5jFoBM.jpg . I like Claude but OpenAI is popular too. OpenAI has a nice way to create a gallery of system prompts that act however you like, they call them Agents or GPTs.

swalsh2y ago

That's just a matter of fine tuning

ijustlovemath2y ago

That "just" is doing some heavy lifting! GPT-4 is just a few matrix multiplications, how bad can their moat really be?

2 more replies

seccode2y ago

Do you have an example model I could try that does this?

1 more reply

Me10002y ago

100% agreed. Gemini advanced does this sometimes. I wrote about it more in an older thread here: https://news.ycombinator.com/item?id=39445484

mattxxx2y ago

I respect the openness here! This is the future that I want to see

giancarlostoro2y ago

Fully agree. People will trash talk it due to Musk but lets not forget the engineers who poured hours of their lives into building this and are continuing to do so.

revscat2y ago

I feel the same about Tesla. They make good cars that are helping to get us off of oil. They have thousand of employees.

And who among us has a CEO that isn’t problematic, even if not so much so as Musk?

2 more replies

sprobertson2y ago

> engineers who poured hours of their lives into building this

Not to mar these specific engineers, but that's an empty phrase that can be said about anything ever built. It doesn't somehow make the idea or implementation good.

1 more reply

afavour2y ago

Were they not paid to do so?

devin2y ago

I still reserve the right to trash talk Musk as I don’t believe he is committed to openness as much as he wants to spite OpenAI for telling him to pound sand.

2 more replies

knowsuchagency2y ago

The engineers who decided to work for him? Forgive me if I do forget about them and the hours of their lives spent on this

1 more reply

trog2y ago

Is it open if it doesn't include the training data? Genuine question - I am not familiar enough with the terms and technology to know. But my understanding is the weights is just a more or less static collection of data that has been (to paraphrase Ted Chiang) lossily compressed from the actual raw training data.

Without the training data to thoroughly evaluate what is in there, the only way you can figure it out is through experimentation - e.g. running it up in a chatbot and asking it questions.

Is this roughly correct or am I misunderstanding what you can do with the weights?

mvkel2y ago

This feels like a "now we can say we're open" PR play rather than contributing much value to the open source community.

What is the practical use of this repo?

1 more reply

machiaweliczny2y ago

If they are so behind they could make it open source instead of open weights and get some help.

nicce2y ago

Fully open-source means also providing open access to their data sets? Which is the only valuable thing Twitter (X) has left.

EastSmith2y ago

> Which is the only valuable thing Twitter (X) has left. reply

They have a very valuable user base (all kinds of world leaders for example), so the data is not the only valuable thing they have.

2 more replies

heyoni2y ago

And the one thing they are vehemently protecting from scrapers and other entities. Even nitter threw in the towel.

xcv1232y ago

It's all open source. You can download the model and run it locally.

paraboul2y ago

Being free to use doesn't mean it ships with the original recipe.

1 more reply

orsenthil2y ago

I am not sure what open source models are accomplishing another than killing the lead from the competition (openai), only to give it to someone else who has expertise in the area of distribution. This will be yet another good addition to systems like Amazon BedRock.

minimaxir2y ago

Many of the recent innovations in both LLM architecture and inference were only made possible through open models such as Llama 2 and Mistral 7B as a starting point for iteration and refinement, which in turn backpropagates (heh) back to the LLMs developers.

It's a win-win for everyone. That's the power of open source.

geor9e2y ago

Well, look at the history. Google had an insurmountable lead, so Elon started OpenAI. Now OpenAI has an insurmountable lead too. So everyone else is starting in third place, or lower. David versus two Goliaths. If you try to become a third Goliath, you'll probably just get smashed. You're later to the game. In this situation, going scorched earth becomes a viable strategy. Slay the Goliaths. Become a hero to the masses. Attract the world's best talent who don't want to be associated with proprietary models. At that point you have a world class AI business with momentum towards AGI. And even if you're giving away last year's technology for free, the team you built is churning out new ideas that could be a financial bonanza one day. Shareholders are willing to pay for a long-term bet if the story is good.

nateglims2y ago

I haven't seen anything about the larger architecture, but I think the value of grok is going to come from it's cheap access to twitter data for RAG etc.

2devnull2y ago

From issues: “Well the magnet file contains a 300GB checkpoint “

That’s why they are using a torrent I suppose.

1 more reply

arduanika2y ago

CODE_OF_CONDUCT.md has only five words. :)

marginalia_nu2y ago

My favorite is SQLite's code of ~~conduct~~ ethics: https://sqlite.org/codeofethics.html

TwentyPosts2y ago

Huh. What's the backstory here?

1 more reply

agmater2y ago

What do you like about it? It seems incredibly creepy to me.

josh-sematic2y ago

They’re from “Bill and Ted’s Excellent Adventure”

arduanika2y ago

Woah, not bad.

bheadmaster2y ago

I was hoping it would be "do not be an asshole", but I guess this is fine too.

1 more reply

schappim2y ago

"Be excellent to each other."

1 more reply

bbor2y ago

Honestly the most interesting part is taking a peek at the kind of AI researcher working for Twitter after the objectively messy layoffs and subsequent crunch. I notice neither of them has Twitter mentioned on their GitHub, which is prolly for the best to avoid harassment lol.

Code wise, excited to see if this could grow into anything! I think it’s pretty clear that Grok didn’t have nearly enough investment to be a top model so Elon “sacrificed” it on a whim in his schoolyard spat with OpenAI, but I’m not complaining. I’ve always took Elon on his word that he truly is worried about centralization of AI, and I don’t think any of the emails released by his schoolmate Altman dissuade me of that. So I have some reasonable hope that he uses some of his immense resources to start “fighting the good fight” here with Le Cun

paxys2y ago

Neither of them works at Twitter. xAI is a separate company, and only uses Twitter’s data to train.

bbor2y ago

Thanks for the correction! I know, I just don’t believe in corporations so the distinction is slight

cma2y ago

>taking a peek at the kind of AI researcher working for Twitter

He made a separate company for this.

greenpizza132y ago

If we just stop looking at Elon, he will lose his power. Why oh why do we keep giving him attention? There are plenty of great models out there that _aren't_ backed by maniacs.

rafaelero2y ago

When those great role models are able to build a profitable spaceship company from the ground up I am sure we will pay attention to them.

j / k navigate · click thread line to collapse

419 comments

extheat2y ago

swalsh2y ago

Considering how poor it is compared to other models, it really emphasises how important fine tuning is. Models with MUCH smaller parameter counts are outperforming it in many metrics.

lukan2y ago

"it really emphasises how important fine tuning is"

Or rather the quality of the training data?

4 more replies

lairv2y ago

I would say it emphasises that training a good model is more than throwing random data and compute

gordian-mind2y ago

Current metrics are a poor way to measure the usefulness of LLMs.

make32y ago

no it empathizes the importance of training smaller models for longer, like the Mistral "overtrained" models

gdiamos2y ago

Show the proof? Does it include IFT?

zone4112y ago

It's actually not the largest. https://huggingface.co/google/switch-c-2048 is 1.6T parameters.

WeMoveOn2y ago

but is switch c even usable? iirc the training set was nowhere near enough for a model of that size to be coherent in a conversation

p1esk2y ago

It’s not 8x86B. Total number of parameters is 314B.

Perhaps it’s 8x39B to fit on a single 8xA100 (40GB) server?

dheera2y ago

They all do this marketing bull.

Mixtral has an 8x7B model but it's actually 46.7B, not 56B params.

Kinda similar to how 4K displays are 3840 pixels wide, not true 4K which would be 4096. Marketing people called it 4K, not engineers.

1 more reply

cma2y ago

Active parameters is 86B, so wouldn't that be the size of the largest two experts (where they may all be the same) + the weights of the selector?

moffkalast2y ago

Most likely it's a MoE of Grok-0 which would be 8x33B + 50B for the router.

ilaksh2y ago

Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?

I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.

Because I doubt it's as simple as just 'python run.py' to get it going.

zone4112y ago

logicchains2y ago

>it's probably easiest

Cheapest maybe, but easiest is just to rent a p4de.24xlarge from AWS for a couple hours to test (at around $40/hour..).

1 more reply

v9v2y ago

The NYT Connections benchmark sounds interesting, are the results available online?

1 more reply

a_wild_dandan2y ago

Someone could run Grok-1 on a 192GB M2 Mac when a 4-bit quant is released; I'm guessing that TheBloke is already working on it.

mohu2y ago

Fairly sure the bloke hasn't created any new quants in a month.

hanselot2y ago

TheBloke dissapeared near the day https://nvd.nist.gov/vuln/detail/CVE-2024-23496 was published.

Of course there has been much speculation on this, I have no more information than this that can be backed up by facts, but the timing was suspicious.

4 more replies

htrp2y ago

Still waiting on this one. Anyone find someone on twitter who can run it?

simonw2y ago

"Base model trained on a large amount of text data, not fine-tuned for any particular task."

Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.

nasir2y ago

porkbeer2y ago

So far that is quite a low bar. But balancing is a thing nontheless, lest we end up with Tay again.

nylonstrung2y ago

For what reason would you want to use this instead of open source alternatives like Mistral

rvnx2y ago

Mistral opened their weights only for very small LLaMA-like model.

MallocVoidstar2y ago

I'm pretty sure Mixtral outperforms Grok-1 and uses much less memory to do it

2 more replies

verticalscaler2y ago

Well if nothing else, this one might be significantly less nerfed. Very interesting to compare to the others.

refulgentis2y ago

It's not, and I mean it, specifically in groks case.

Generally, it's a boring boneheaded talking point that the 1% of us actually working in AI use as a sorting hat for who else is.

8 more replies

zozbot2342y ago

tlb2y ago

Not super easy if they have different tokenizers.

pogue2y ago

monkin2y ago

It's 318.24G

https://academictorrents.com/details/5f96d43576e3d386c9ba65b...

whywhywhywhy2y ago

Because Bittorrent is an outstanding tech for delivering large files, more I think about it the more I'm surprised it wasn't taken advantage of more.

Marlinski2y ago

2 more replies

fzzzy2y ago

It may become a tradition since weights are so large. Perhaps it started when the Llama torrent link leaked. Then, Mistral decided to release their weights using bittorrent.

bongodongobob2y ago

I'm not sure why you wouldn't tbh. That's a lot of bandwidth.

1 more reply

MallocVoidstar2y ago

Distributing 300GB via torrent is cheaper than direct, assuming even a few other people seed

ur-whale2y ago

> Can someone explain why the weights are posted via a Bittorrent magnet link?

I think the best way to get an answer to that question is to try to host it yourself and see what happens.

raydev2y ago

Spreads the burden/cost of distributing a 300+GB file.

CamperBob22y ago

How else could/should it be done?

pogue2y ago

I would have assumed they could just upload it to Github. If it has restrictions on file size I'm sure they could make multiple part compressed files.

Torrents can unfortunately die after a period of time if no one continues seeding it or if they don't use a permanent web based seeder, which doesn't appear to be the case.

6 more replies

seydor2y ago

my optimistic explanation is we are going back to the 2000s internet , but probably we are not

fzzzy2y ago

Let's hope so.

pooloo2y ago

Its likely over 100GB of data, so I wouldn't say its necessarily unusual to spread out the bandwidth across multiple hosts.

pogue2y ago

Thanks! I searched and searched for a tool that would show me info via the web about a magnet link but nada

lambdaba2y ago

Why not? Mistral was first to do it, it has become tradition.

orlp2y ago

BitTorrent is just an objectively superior method of delivering a lot of data to a lot of people.

gillesjacobs2y ago

I believe it was Llama 1 that notoriously got leaked with a torrent on 4chan.

1 more reply

leumon2y ago

Mistral did it too when they released their first open model. They just posted a magnet link on Twitter.

frankjr2y ago

joydeep3142y ago

Model weights on huggingface: https://huggingface.co/xai-org/grok-1

cl3misch2y ago

Love the minimal repo, magnet link, and stating "open weights" instead of "open source". Refreshing!

TheDudeMan2y ago

Elon says open source:

https://twitter.com/elonmusk/status/1767108624038449405?s=46...

stale20022y ago

Hey, asking any experts here, what are their first thoughts in the significance of this?

IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?

The only thing I see, of the top of my head, is that it is a very large model, and I don't think any models of similar size have been released.

Me10002y ago

Not an expert by any means, but I like learning about this stuff and I play with a lot of open weight models.

But I welcome any contribution to the open weight LLM community. Hopefully people will learn something interesting with this model. And I hope they keep releasing new versions!

MichaelRazum2y ago

If I may ask, how do you load such big models? 300gb seems like a lot to play around with.

1 more reply

brucethemoose22y ago

Tests are not out yet, but:

- It's very large, yes.

- It's a base model, so its not really practical to use without further finetuning.

- Based on Grok-1 API performance (which itself is probably a finetune) its... not great at all.

whimsicalism2y ago

seems like a large undertrained model, not that exciting imo compared to mixtral

it is also not the biggest model oss, switch transformer was released years ago and is larger and similarly undertrained

modeless2y ago

Is this the first major model to be natively FP8? I was wondering why people hadn't done it yet. Seems like a big win when hardware supports it.

a_wild_dandan2y ago

No, e.g. Yi-34B.

modeless2y ago

As far as I can tell Yi-34B is natively 16 bit float, the 8 bit version is quantized. https://huggingface.co/01-ai/Yi-34B#quantization

tosh2y ago

blog post: https://x.ai/blog/grok-os

  * 314B parameters (86B active at a time)
  * mixture of experts 8 (2 active at a time)
  * weights and architecture licensed under Apache 2.0

(edit:) announcement blog post from last year with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: https://x.ai/blog/grok

(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models

OkGoDoIt2y ago

Is a model so huge that’s only at the level of GPT 3.5 actually good? That seems incredibly inefficient to me.

fwlr2y ago

2 more replies

cma2y ago

drak0n1c2y ago

It’s designed to be actively searching real-time posts on X. Apples and oranges.

3 more replies

xcv1232y ago

According to their benchmarks it is superior to GPT-3.5

asciii2y ago

I love the citation for image in the article

TOMDM2y ago

Mixtral is also comparable to gpt 3.5 and open.

At 8x7B it's also a fraction of the size. Are there any benchmarks comparing Mixtral to Grok?

tosh2y ago

Mixtral announcement is here: https://mistral.ai/news/mixtral-of-experts/

Mixtral looks more economical @ capability to size (similar also for Qwen 1.5 72b)

tootie2y ago

How is it that OpenAI was touted like it was some massive years-long effort that blew all AI research out of the water and now we have so many competitors popping up one after another?

longdog2y ago

1 more reply

cavisne2y ago

LLM training is arcane and expensive to experiment with. So OpenAI had to waste a lot of time and GPU-hours on things that didn't work to learn the tricks that did work.

Most of the competitors have lineage straight back to OpenAI, eg the lead of x.ai was previously at OpenAI and Deepmind. Likewise with Mistral and especially Anthropic.

jxy2y ago

OpenAI still seems to be at the top, except for Anthropic, who may be close, in terms of the capabilities comparing gpt-4 and claude-opus.

ben_w2y ago

Egg of Columbus.

shantnutiwari2y ago

Those of us who dont spend all our time in LLMs-- whats this about? Whats the big deal and why is it on the front page at #1?

kayge2y ago

I think this paragraph from an earlier Wired article [1] sums it up pretty well:

  "After suing OpenAI this month, alleging the company has become too closed, Elon Musk says he will release his “truth-seeking” answer to ChatGPT, the chatbot Grok, for anyone to download and use."

[1] https://www.wired.com/story/elon-musk-no-choice-open-chatbot...

solumunus2y ago

The deeper reason is that he’s throwing his toys out of the pram after failing to convince OpenAI to become part of Tesla.

moralestapia2y ago

Well, he delivered.

paxys2y ago

Partially. Open weights is not open source.

gfodor2y ago

1 more reply

xcv1232y ago

The architecture of the model is open source. Not just the weights. You can run the entire thing locally.

gardenhedge2y ago

> Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code

What type of machine do you need to play around with this?

3170702y ago

Probably a machine with about 628 GB of GPU memory. (2 bytes per parameter)

So 8xH100 (80Gb each) should do it.

Marlinski2y ago

I suppose it can be quantizised

anigbrowl2y ago

'Chunky beast, needs 320 Gb VRAM likely 4 bit, likely is being run 8 bit on 8 x 80 Gb GPUs.'

-Emad

a_wild_dandan2y ago

A single 192GB M2 Mac using a 4-bit quant would work.

simonw2y ago

Is there a model card anywhere? I'd like to know what it was trained on.

hubraumhugo2y ago

When will we reach an upper limit/dimishing returns in terms of number of parameters and mixture of experts?

andy992y ago

lambdaba2y ago

Claude 3 has *decisively* beat GPT-4, I wonder how all their attributes compare.

4 more replies

YetAnotherNick2y ago

1 more reply

squigz2y ago

I think Groq is something else?

2 more replies

ldjkfkdsjnv2y ago

Claude > GPT4. Anyone using these models on a daily basis knows this

2 more replies

littlestymaar2y ago

EastSmith2y ago

There is a friendly warning here from Groq: https://wow.groq.com/hey-elon-its-time-to-cease-de-grok/

bhaney2y ago

Is it safe to say, 4 months later, that Elon is ignoring this? I assume there hasn't been any kind of response or further action taken yet.

mlindner2y ago

Grok is a word in common parlance. So there's no way they could succeed in any suit. That's why the Groq team picked a modification of the word.

littlestymaar2y ago

You mean like Canvas®, Apple®, Windows® or Amazon®? Wanna try re-use these for your own business and see how it goes?

There's nothing preventing you to trademark common words, it just must not be descriptive of your business.

nostrebored2y ago

Would be a rough trademark enforcement case as “Grok” has been in common language for decades

ben_w2y ago

So has "Apple" and "Windows".

Grok and groq both relate to AI, so there's definitely grounds to believe the names may cause consumer confusion.

After all, Apple (computers) was repeatedly sued by Apple (records) for doing music things.

1 more reply

Findecanor2y ago

I myself have never heard it outside of "nerdy" circles... that is: people who would read science fiction.

1 more reply

Angostura2y ago

Robert A. Heinlein coined the term grok in 1961

1 more reply

cavisne2y ago

They already have.

aussieguy12342y ago

How hard would it be for an open source group to fine tune this into a chatbot?

ArunRaja2y ago

Is this grok open sourcing really a big deal? How is this move beneficial for grok per se? Does it build trust as in other opensource products..?

LZ_Khan2y ago

How are people's experience with this model? Having the most weights is one thing but being a better model than the 70B models is another.

swalsh2y ago

I use grok all the time to find tweets or ask about trends on Twitter. For that it's better than what used to exist. But its not a great model outside that narrow use case.

labrador2y ago

I'd love to proven wrong if someone cares to share something interesting produced by Grok.

andre-z2y ago

The only other Repository is a fork of Qdrant.

sqreept2y ago

What are the languages supported by it?

cyanydeez2y ago

Tweets.

rvnx2y ago

One subtle thing: Musk said "open-source", we got "open-weights" instead (still better than nothing though, so it's greatly appreciated).

tylerekahn2y ago

This is the weights and the model under Apache 2.0 license. What do you mean by open-source?

https://github.com/xai-org/grok/blob/main/model.py

https://github.com/xai-org/grok/blob/main/run.py#L25

pclmulqdq2y ago

Still better than most of the "open weights" models that have massively restrictive terms.

solarkraft2y ago

He also called permissively licensing Tesla's patents "open sourcing" them. He's at the forefront of misusing the term.

drexlspivey2y ago

The “source” in “open source” refers to source code which they released. A dataset is not source code, if anyone is misusing the term it’s you.

3 more replies

paulgb2y ago

Dumb question: what should open-source mean in the context of something like this? Open access to the training data and training pipeline as well?

CharlesW2y ago

It's not a dumb question, and the answer is "yes".

4 more replies

TaylorAlexander2y ago

The Open Source Initiative is actively working on this over the course of this year, and your input will help define that meaning! Please see here for more:

https://opensource.org/blog/open-source-ai-definition-weekly...

Q6T46nT668w6i3m2y ago

Yes, training and evaluation code, i.e., the code used to generate the weights.

TaylorAlexander2y ago

captcanuk2y ago

"The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model."

Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying "you don't know her, she goes to a different high school"

gfodor2y ago

Always love it when someone gives away a gift and it’s not enough for people.

captcanuk2y ago

1 more reply

atleastoptimal2y ago

I think everyone should realize the following realities of the LLM market

1. For sub-SOTA LLM's, distribution/marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill

2. For SOTA LLM's, keeping it closed and proprietary is the strategic play

If grok were SOTA Elon never would have open sourced it. It's not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.

keepamovin2y ago

I recall Elon saying something like this in an interview so I think it’s less of a deceptive take then perhaps your comment suggest.

I think he said something like proprietary AI tech is going to be one year to 18 months ahead of where open source tech is which will follow on like one year to 18 months later.

Suggesting that he’s aware of this dynamic and he’s not trying to conceal or misrepresent that.

In other words, perhaps this was SOTA one year to two years ago?

atleastoptimal2y ago

1 more reply

mlindner2y ago

If it's better than any other open source LLM does that even matter? (I say "if" because I don't know.)

redskyluan2y ago

This seems not be a repo ready to open source. You only get weights, very less information about how the weights is trained and finetuned.

But anyway, it always great to see more LLM weigts available.

andrewstuart22y ago

rezonant2y ago

Well what constitutes an "open source" model is still controversial and debatable-- lots of people on both sides of that argument.

asadotzler2y ago

Open source has had a useful agreed upon meaning for over 25 years. Maybe you're too young to understand why that matters but we're not.

1 more reply

sashank_15092y ago

In all the debate about open source I don’t think people realize, this model is most likely not reproducible ever again even given the code. Here’s what you need to reproduce the model:

1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.

2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.

I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?

Grimblewald2y ago

skissane2y ago

> Which is why I don't buy into the LLMs don't have personal opinions schtick

I hate how LLMs have been deliberately trained to be incoherent on this topic.

Obviously they do have beliefs/opinions/desires/etc in the sense of emulating (even if incompletely) the externally visible aspects of those phenomena as they exist in humans.

1 more reply

seccode2y ago

It would be cool if these models had conversations with us where they ask questions. I think the future of AI is models that ask questions. There is so much data to be gained by doing this.

crowcroft2y ago

Ok im curious, but I don’t quite understand.

What would you want an AI to be asking you, and what would you want it to do with your response(s)?

lars_francke2y ago

Clarifying questions if the initial prompt was unclear. I'd love it.

I regularly try to add something along the lines of "please ask clarifying questions if you could only give a generic or partial response otherwise" but so far it has never helped (ChatGPT 4).

1 more reply

BoorishBears2y ago

I ask AI to produce clarifying questions then answer them.

Can help in not wasting a bunch of time waiting for an answer that missed the mark.

I think the sibling comment is probably the least attractive reason to have AI ask questions.

1 more reply

seccode2y ago

In order for AI to understand the world, it would have to ask questions. Understanding humans is key to understanding the world.

globular-toast2y ago

Learn from them.

geor9e2y ago

swalsh2y ago

That's just a matter of fine tuning

ijustlovemath2y ago

That "just" is doing some heavy lifting! GPT-4 is just a few matrix multiplications, how bad can their moat really be?

2 more replies

seccode2y ago

Do you have an example model I could try that does this?

1 more reply

Me10002y ago

100% agreed. Gemini advanced does this sometimes. I wrote about it more in an older thread here: https://news.ycombinator.com/item?id=39445484

mattxxx2y ago

I respect the openness here! This is the future that I want to see

giancarlostoro2y ago

Fully agree. People will trash talk it due to Musk but lets not forget the engineers who poured hours of their lives into building this and are continuing to do so.

revscat2y ago

I feel the same about Tesla. They make good cars that are helping to get us off of oil. They have thousand of employees.

And who among us has a CEO that isn’t problematic, even if not so much so as Musk?

2 more replies

sprobertson2y ago

> engineers who poured hours of their lives into building this

Not to mar these specific engineers, but that's an empty phrase that can be said about anything ever built. It doesn't somehow make the idea or implementation good.

1 more reply

afavour2y ago

Were they not paid to do so?

devin2y ago

I still reserve the right to trash talk Musk as I don’t believe he is committed to openness as much as he wants to spite OpenAI for telling him to pound sand.

2 more replies

knowsuchagency2y ago

The engineers who decided to work for him? Forgive me if I do forget about them and the hours of their lives spent on this

1 more reply

trog2y ago

Without the training data to thoroughly evaluate what is in there, the only way you can figure it out is through experimentation - e.g. running it up in a chatbot and asking it questions.

Is this roughly correct or am I misunderstanding what you can do with the weights?

mvkel2y ago

This feels like a "now we can say we're open" PR play rather than contributing much value to the open source community.

What is the practical use of this repo?

1 more reply

machiaweliczny2y ago

If they are so behind they could make it open source instead of open weights and get some help.

nicce2y ago

Fully open-source means also providing open access to their data sets? Which is the only valuable thing Twitter (X) has left.

EastSmith2y ago

> Which is the only valuable thing Twitter (X) has left. reply

They have a very valuable user base (all kinds of world leaders for example), so the data is not the only valuable thing they have.

2 more replies

heyoni2y ago

And the one thing they are vehemently protecting from scrapers and other entities. Even nitter threw in the towel.

xcv1232y ago

It's all open source. You can download the model and run it locally.

paraboul2y ago

Being free to use doesn't mean it ships with the original recipe.

1 more reply

orsenthil2y ago

minimaxir2y ago

It's a win-win for everyone. That's the power of open source.

geor9e2y ago

nateglims2y ago

I haven't seen anything about the larger architecture, but I think the value of grok is going to come from it's cheap access to twitter data for RAG etc.

2devnull2y ago

From issues: “Well the magnet file contains a 300GB checkpoint “

That’s why they are using a torrent I suppose.

1 more reply

arduanika2y ago

CODE_OF_CONDUCT.md has only five words. :)

marginalia_nu2y ago

My favorite is SQLite's code of ~~conduct~~ ethics: https://sqlite.org/codeofethics.html

TwentyPosts2y ago

Huh. What's the backstory here?

1 more reply

agmater2y ago

What do you like about it? It seems incredibly creepy to me.

josh-sematic2y ago

They’re from “Bill and Ted’s Excellent Adventure”

arduanika2y ago

Woah, not bad.

bheadmaster2y ago

I was hoping it would be "do not be an asshole", but I guess this is fine too.

1 more reply

schappim2y ago

"Be excellent to each other."

1 more reply

bbor2y ago

paxys2y ago

Neither of them works at Twitter. xAI is a separate company, and only uses Twitter’s data to train.

bbor2y ago

Thanks for the correction! I know, I just don’t believe in corporations so the distinction is slight

cma2y ago

>taking a peek at the kind of AI researcher working for Twitter

He made a separate company for this.

greenpizza132y ago

If we just stop looking at Elon, he will lose his power. Why oh why do we keep giving him attention? There are plenty of great models out there that _aren't_ backed by maniacs.

rafaelero2y ago

When those great role models are able to build a profitable spaceship company from the ground up I am sure we will pay attention to them.

j / k navigate · click thread line to collapse