Grok3 Launch [video]

1354 comments

Karpathy gave his initial impression: https://x.com/karpathy/status/1891720635363254772

The pull quote is: The impression overall I got here is that this is somewhere around (OpenAI) o1-pro capability

The impression seems to be warranted: Grok 3 has directly jumpted to the top of all leaderboard categories in Chatbot Arena: https://lmarena.ai/?leaderboard

In math it shares the top spot with o1 and is just a few points behind (well within errors). In creative writing it is basically ex-aequo with the latest ChatGPT 4o and in coding it's actually significantly ahead of everyone else and represents a new SOTA.

4 more replies

yodsanklai1y ago

Naive question from a bystander , but since DeepSeek is open source and is on par with o1-pro (is it?), shouldn't we expect that anybody with the computer power is capable to compete with o1-pro?

6 more replies

dang1y ago

Related ongoing thread:

Andrej Karpathy: "I was given early access to Grok 3 earlier today" - https://news.ycombinator.com/item?id=43092066 - Feb 2025 (48 comments)

xiphias21y ago

I don't see the Think button, and for me the answer is much below deepseek-r1 even thought I have Premium+ subscription. I'm just getting instant stupid answer instead of thinking.

joeevans10001y ago

How can anyone repeatedly use a question like this without new models getting trained on it via online discussion?

rendang1y ago

Grok has gotten to the top of one benchmark:

https://x.com/lmarena_ai/status/1891706264800936307

It's been said before but it is great news for consumers that there's so much competition in the LLM space. If it's hard for any one player to get daylight between them & the 2nd best alternative, hopefully that means one monopolistic firm isn't going to be sucking up all the value created by these things

qingcharles1y ago

I've spent the last hour testing it and I'm blown away. And this is coming from a very hardcore user of OpenAI/Claude products on a daily basis.

It passed every goofy test I have for writing articles which involves trying to surface arcane obscure details. (it certainly means however they are scraping the Web they are doing a good job here)

It made the database code I wrote over the last week with o3/o1/GPT4o/Claude3.5 look like a joke.

It fills me with rage over who owns this thing.

Even if people tank Tesla's car business and run Twitter into the ground, I think our new Galactic Edgelord is going to win his first trillion on xAI and Teslabots anyway.

btw: it tried to charge me $40/mo for this thing: https://imgur.com/a/QXslgBo

RobinL1y ago

Apologies for possibly stupid question but where can you use it right now? Just on 'direct chat' on https://lmarena.ai/ or is there a better alternative? Or do you have early access?

3 more replies

giancarlostoro1y ago

For whatever it is worth, I frequently enough see the devs asking for feedback, so I suspect, if you tweet about Grok, or reply to any of those threads, they definitely read them, even if they don't respond / interact. It shows. I've seen improvements based on feedback I see others make almost instantly.

crocowhile1y ago

It's not good news when this competition comes at cost of a gigantic over inflated bubble, in which all the big players keep on sucking billions from investors without even having a business model.

This hype will burst sooner than later and will trigger yet another global recession. This is untenable.

bobxmax1y ago

ChatGPT is literally generating billions in revenue. Cursor is the fastest growing company of all time.

This lame HN trope of LLMs having no business model needs to die.

8 more replies

Ray201y ago

I think business model there pretty simple: to be in the front line when AI will go into the category of landscape-changing trillion dollar technologies. And investors keep pouring their billions exactly for that business model.

>This hype will burst sooner than later and will trigger yet another global recession.

It seems to small of bubble for global recession. I mean if it is a bubble at all, there is all the reasons to believe that the strategy will work with a significant probability.

1 more reply

qgin1y ago

The premise is that this ultimately replaces all intellectual and physical labor for the rest of time. It’s possible it becomes commoditized as soon as it exists, but in terms of investment dollars it’s either worth as much as you can spend or nothing at all.

1 more reply

jimbokun1y ago

The dot com bubble wiped out many billions of dollars in valuation.

The dot com bubble also gave us the most valuable companies in history, like Google, Apple, Amazon, Facebook, etc.

rendang1y ago

The big companies could crash significantly, but if the technology keeps bringing productivity gains, it will have a big positive impact on GDP over the next decade

rukuu0011y ago

It’s a battle royale. Whoever lasts longest gets to profit at leisure

aprilthird20211y ago

I think it's already clear that these are going to be commoditized and the free / open source versions will be good enough to capture enough of the value that the remaining players will not be Facebook-level monopolies on the space

rendang1y ago

Apparently it isn't clear to the investors valuing OpenAI at >300B. Possibly they're betting that the ecosystem & integrations around their models will generate a certain amount of lock-in or otherwise make the difference in a close-to-even field

5 more replies

bigbones1y ago

the IP rights holders have yet to bare their teeth. I don't think the outcome you suggest is clear at all, in fact I think if anything entirely the opposite is the most probable outcome. I've lost count of the number of technology epochs that at the time were either silently or explicitly dependent on ignoring the warez aspects while being blinded by the possibilities, Internet video, music and film all went through this phase. GPTs are just a new medium, and by the end of it royalties will in all likelihood still end up being paid to roughly the same set of folk as before

I quite like the idea of a future where the AI job holocaust largely never happened because license costs ate up most of the innovation benefit. It's just the kind of regressive greed that keeps the world ticking along and wouldn't be surprised if we ended up with something very close to this

5 more replies

thefourthchime1y ago

Exactly. I use GPT4o for nearly everything, and occasionally, I'll need o1. For 95% of what I do, it's already good enough.

bobxmax1y ago

The vast majority of people couldn't care less about open source

3 more replies

croes1y ago

Who cares about benchmarks?

These things still cost me time because of hallucinations.

ban-evader1y ago

You’re a very poor user of LLMs if they’re not a net time saver for you.

1 more reply

cheema331y ago

Humans hallucinate as well. Benchmarks count.

1 more reply

Mekoloto1y ago

It doesn't matter if it costs time.

It matters if it is better than what you have.

If it breaks a cup but is 10x cheaper than a human, go figer

SecretDreams1y ago

Probably bad news for the vendors, though. I genuinely struggle to see how most of these LLM companies are going to monetize and profit off their efforts with LLMs already in commodity territory. Government contracts can only flow for so long?

5 more replies

Refusing231y ago

benchmarks dont show the quality or 'correctness' of the response though.

1 more reply

ddxv1y ago

Yep, seeing Grok come out I'm just so glad there are free alternatives that aren't behind paywalls.

Hamuko1y ago

>It's been said before but it is great news for consumers that there's so much competition in the LLM space.

Is it? Because it seems like a bunch of megacorps pirating every single copyrighted work available in digital format, spending an enormous amount of electricity (that is probably not 100% clean) to churn through them, and the end result we have a bunch of parrots that may or may not produce accurate results so that spammers can more effectively fill the Internet with crap.

1 more reply

msuvakov1y ago

To put it this way: after seeing examples of how a LLM with similar capabilities to state-of-the-art ones can be built with 20 times less money, we now have proof that the same can be done with 20 times more money as well!

jansan1y ago

There was this joke about rich Russians that I heard maybe 25 years ago.

Two rich Russian guys meet and one brags about his new necktie. "Look at this, I paid $500 for it." The other rich Russian guy replies: "Well, that is quite nice, but you have to take better care of your money. I have seen that same necktie just yesterday in another shop for $1000."

1 more reply

lopatamd1y ago

Imagine what they'll achieve if they'll apply deepseek methods here with this insane compute

1 more reply

ilaksh1y ago

If what they say is true, then you have to give them credit for catching up incredibly fast. And slightly pulling ahead. Not only with the models, but also products.

htfy961y ago

I have a close friend working in core research teams there. Based on our chats, the secret seems to be (1) massive compute power (2) ridiculous pay to attract top talents from established teams (3) extremelly hard work without big corp bureaucracy.

hector1261y ago

Anecdotal, but I've gotten three recruiting emails from them now for joining their iOS team. I got on a call and confirmed they were offering FAANG++ comp but with the expectation of in-office 50h+ (realistically more).

I don't have that dog in me anymore, but there are plenty of engineers who do and will happily work those hours for 500k USD.

2 more replies

nomilk1y ago

It was mentioned during the launch that current datacenter requires up to 0.25 gigawatts of power. The datacenter they're currently building will require 1.25 (5x) (for reference, a nuclear powerplant might output about 1 gigawatt). Will be interesting to see if the relationship between power/compute/parameters and performance is exponential, logarithmic or something more linear.

4 more replies

thefourthchime1y ago

No a bad recipe for success.

ddxv1y ago

To me, it seemed like they spent there money to get there. They talked about the massive datacenter they built, but will it pay off is the question.

jiggawatts1y ago

They may not need direct subscription revenue to recoup their investment.

A variant of multi-modal LLMs may be the solution to self-driving cars, home robotics, and more.

I keep saying that to be a really effective driver, an AI model will need a theory of mind, which the larger LLMs appear to have. Similarly, any such model will need to be able to do OCR and read arbitrary street signs, and understand what the sign meant. Most modern LLMs can already do this.

3 more replies

Rover2221y ago

And they mentioned at the end of the presentation that they're already planning their next datacenter, which will require 5x the power. Not sure if that means equivalent to ~1,000,000 of the current GPU's, or more because next-gen Nvidia chips are more efficient.

grubbs1y ago

The B300 8-way SXMs will use around 1.4kW for each GPU. I think the TDP on an H100 is like 700W.

littlestymaar1y ago

I don't think anyone who's paid attention to the LLM scene will give them any “credit for catching up fast” as it has been pretty obvious for the past two years that all it takes to reach the state of the art is a big enough GPU cluster.

Deepseek made the news because how they were able to do it with significantly less hardware than their American counterparts, but given that Musk has spent the last two years telling everyone how he was building the biggest AI cluster ever, it's no surprise that they manage to reproduce the kind of performances other players are showing.

dmix1y ago

This severely underestimates the talent still required. Deepseek didn't come out just because it's cheaper, it came out because a very talented team figured out how to make it cheaper.

1 more reply

shekhargulati1y ago

I don't know, but I found the recording uninspiring. There was nothing new for me. We've all seen reasoning models by now—we know they work well for certain use cases. We've also seen "Deep Researchers," so nothing new there either.

No matter what people say, they're all just copying OpenAI. I'm not a huge fan of OpenAI, but I think they're still the ones showing what can be done. Yes, xAI might have taken less time because of their huge cluster, but it’s not inspiring to me. Also, the dark room setup was depressing.

pixelsort1y ago

Seems like the opinion of someone who doesn't know that OpenAI cloned Anthropic's innovations of artifacts and computer use with their "canvas" and "operator".

cromwellian1y ago

Those are applied-ML level advancements, OpenAI has pushed model level advancements. xAI has never really done much it seemed except download the latest papers and reproduce them.

1 more reply

swyx1y ago

and Gemini's Deep Research

1 more reply

talles1y ago

This sounds like "this feature is so 2024".

tw19841y ago

Karpathy believes that this is at o1-pro level[1].

This again proves that OpenAI simply has no tech moat whatsoever. Elon's $97 billion offer for OpenAI last week was reasonable given that xAI already have something just a few months behind - it would probably be faster for xAI to catch up with o3 than going through all those paperworks and lawyer talks required for such an acquisition.

Elon also has some huge up-hand here -

Elon and his mum are extremely popular in China, it would be easier for him to acquire Chinese AI engineers. He can offer xAI/XSpace/Neurallink shares to those best AI engineers who'd prefer some kind of almost guaranteed 8 figure return in long run.

Good luck to OpenAI investors who still believe that OpenAI worth anything more than $100 billion.

[1] https://x.com/karpathy/status/1891720635363254772

SilverBirch1y ago

Firstly, the 97Bn was for the non-profit, not for the company. The company is being valued in funding rounds closer to 300Bn. I think it may be true that OpenAI has no moat, but if it has no moat then all of these AI companies are overvalued (including xAI) and Elon should just stop bothering to throw his money at it. I would say Elon probably actually doesn't have much of an advantage here. In both SpaceX and Tesla he was able to do something no competitor could do - raise cash. Car companies simply couldn't invest in tech research to build self-driving to compete with Tesla. SpaceX consumed enormous amounts of cash before anyone saw value. That is a unique skill that Elon had over the 2010s.

That is not an advantage in a race against Microsoft, Google, Meta etc. he's competing against all the biggest companies in the world in this race. He's not going to be able to outspend them if the economics look at all sensible.

2 more replies

UltraSane1y ago

Will the vast sums of money being spent on Grok ever actually have a positive ROI?

Rover2221y ago

Grok 3 at the top of Chatboat Arena with 1400, and the model will continue to improve as it trains more.

riku_iki1y ago

And DeepSeek is just 3% behind. It seems in that benchmark all LLMs perform well and top is formed within some statistical error.

rvnx1y ago

It could also be that they got "inspired" by DeepSeek, hence the very similar results.

So it could be that their success is mostly about taking an open and free thing, and turned it proprietary.

torginus1y ago

These percentage points don't mean anything. Look up how the Elo system works. They just add 1000 to the result to make it a nicer number.

1 more reply

1024core1y ago

And Anthropic not even in the top 10 ...

3abiton1y ago

I keep hearing about Claude's impressive coding skills (compared to its benches) yet, not evident for me (I use the web version, not cline). Compared to 4o it's not that great.

7 more replies

waynenilsen1y ago

yet with claude still the most useful, lmsys is broken for coding

bearjaws1y ago

Any model that censors itself does poorly, despite being able to provide high quality answers.

bangaladore1y ago

Anthropic best model is Sonnet 3.5 in my opinion. The reason its good is it is very effective for the price and fast. (I do think Google has caught up a lot in this regard). However, not having COT makes its results worse than similarly cheap COT based models.

Leaderboards don't care about cost. Leaderboards largely rank a combination of accuracy + speed. Anthropic has fell behind Google in accuracy + speed (again missing COT), and frankly behind Google in raw speed.

rvz1y ago

No idea why was this downvoted, but you are correct.

Seems like the team at xAI caught up very quickly to OpenAI to be at the top of the leaderboard in one of the benchmarks and also caught up with features with Grok 3.

Giving credit where credit is due, even though this is a race to zero.

sinuhe691y ago

We got more emotional and opinionated people on HN and they often reacted in an emotional way instead of using logic and being curious.

Rover2221y ago

Yeah, so many people aren't capable of talking about anything Musk-adjacent with clear thoughts. It's insane how quickly xAI went from not existing, to the top of the benchmarks.

3 more replies

DonHopkins1y ago

You know what they say: Fascists are good at keeping the training runs on time.

xnx1y ago

A very impressive debut. No doubt they benefited from all the research and discoveries that have preceded it.

Maybe the best outcome of a competitive Grok is breaking the mindshare stranglehold that ChatGPT has on the public at large and with HN. There are many good frontier models that are all very close in capabilities.

nmca1y ago

This is grok 3, so not a debut

2 more replies

CephalopodMD1y ago

Gemini has been topping benchmarks and leaderboards for weeks if not months at this point. Nobody cares.

pveierland1y ago

TLDW. Will this be open weights?

This commit seems to indicate so, but neither HF or GH has public data yet:

https://huggingface.co/xai-org/grok-1/commit/91d3a51143e7fc2...

Edit: Answer from Elon in video is that they plan to make Grok 2 weights open once Grok 3 is stable.

bearjaws1y ago

This is how they've done the past releases as well, soon after they release the latest and greatest they open source the last model.

zone4111y ago

Apparently the API will only be available in a few weeks, so I can't run my independent benchmarks yet.

CSMastermind1y ago

I'm waiting for this as well, though I did try to run several manually now that it's live and the results have been impressive so far.

drpossum1y ago

Thanks for the update.

ngai_aku1y ago

Any guess on its availability at that point? Is it likely to be limited to certain tiers like o1?

sebzim45001y ago

Controversial opinion but I think the AI game studio idea is a very good one. Not because I think they will make any money off the games, but dogfooding will lead to so much more improvement than relying on feedback from external customers.

drusepth1y ago

We're 1-2 years into our AI game studio [1] if anyone has more questions on it.

Seeing awesome feedback from players on our demos (and seeing an insane amount of stickiness from players playing even small demos built around generative AI mechanics). Raising now. Hiring soon to move faster. Feel free to reach out - dru@chromagolem.com

[1] https://www.chromagolem.com/

firejake3081y ago

Especially on code, think of all the free data you get from the generation-evaluation loop

captainclam1y ago

What is dogfooding?

1 more reply

mobiuscog1y ago

Because 'dogfooding' has worked so well for other products...

If you don't get feedback from the people actually playing your game (or using your product), you will never get the improvement you need to help them.

You can have the most talented passionate people there are developing a product, but if it's not working for the people you want to sell it to, it's the wrong product.

Most tech products are terrible because those paying for them are not those that have to use them every day, or because they solve a corporate problem (compliance) and not a usability problem which is the actual need from the people on the shop floor.

Many big games/products are already built mostly on metrics, and that has proven to be a terrible way to work out what people 'want'. It's a great way to justify money decisions though, so it keeps happening (and games/products from big companies keep getting worse).

2 more replies

hnroo991y ago

Looks like they recently updated their ToS as well: https://www.diffchecker.com/w4dbxWwt/

mrbonner1y ago

Have you thought of a future where LLM will be fined tune to target advertisment to you? I mean look at search: first iterations of search were pretty simple in term of ads. Then personalized ads came. I wouldn't help but envision the distopia where the LLM will insert personalized ads based on what you are asking for help.

dekervin1y ago

It's way worse than that. First, We interact with LLMs through private conversation and we are used to have private conversation with human we trust. Some of that trust will be transfered to LLMs. Second, LLMs have a vastly bigger "mental" power to build a long term mental model of us, while we interact with them. Which mean they can chose with extreme precision their words to trigger an emotion, a certain reaction.

Combine the two and the potential for manipulation, suggestion, preference altering is through the roof.

1 more reply

mherrmann1y ago

I do believe this is the next natural evolution. People don't like to pay for things and ads are a proven business model. I bet the big labs are looking into this

qoez1y ago

Meta probably already does this on the top 10k people that spends most or is a high ranked influencer on instagram etc.

UltraSane1y ago

I'm more worried about LLMs with specific political biases built into them. Imagine one that sounds like Conservapedia or the most insane left-wing parts of Tumblr

I_am_tiberius1y ago

Does it already include the datasets Musk received from the government or do I have to wait for Grok4?

Alifatisk1y ago

Do we have any details on how large the context window is? Or how many input tokens it can handle?

korantu1y ago

In the opening blog post they mentioned it to be 1 m tokens.

behnamoh1y ago

Will he do what he promised and open source Grok 2 now?

1024core1y ago

The question came up and he said they would, once Grok-3 is fully released.

rvz1y ago

I'd expect them to open source it just like they did with Grok.

We're still waiting for OpenAI to do the same. Even at least GPT-3.

andsoitis1y ago

> We're still waiting for OpenAI to do the same. Even at least GPT-3.

The exact details of OpenAI's models and training data are not fully disclosed, which can raise concerns about potential biases or vulnerabilities.

1 more reply

FergusArgyll1y ago

Note: this is "before April" so not a complete assessment

https://manifold.markets/SaviorofPlant/will-xai-open-source-...

guappa1y ago

I'm sure not.

harisec1y ago

Anybody can try Grok3 on Chatbot Arena (even if you are in Europe). Select Direct Chat and select the model early-grok-3. https://lmarena.ai/

pkkkzip1y ago

Am I the only one who isn't impressed by this? Grok3 is failing basic OCR, react/sql coding excercises that Sonnet and Gemini completes successfully.

I'm also skeptical of lmarena as there is a large number of Elon Musk zealots trying to pass off Grok as a proxy for Tesla shares.

misiti37801y ago

examples? i have been using it all morning and just canceled my claud subscription

pred_1y ago

> Currently, Grok Web is not accessible in the United Kingdom or the countries of the European Union. We are diligently working to extend our services to these regions, prioritizing compliance with local data protection and privacy laws to ensure your information remains safely secure.

I suppose you can take that to mean that people who do have access to the service should not expect much in terms of data protection.

sigmoid101y ago

There are just more regulations to comply with before a release. OpenAI's new Deep Research tool wasn't originally available in the EU either, but it was released less then a week after it came out in the US. Since the EU is a gigantic market with a lot of buying power and this release makes a strong case for people to switch over from competitors, I doubt it'll take long.

1 more reply

sam3451y ago

Do you mean data protection or political correctness/control of discourse protection?

1 more reply

markdog121y ago

Not in Canada, either :(

verisimi1y ago

The EU and UK are good for data protection?

cbg01y ago

Pretty good considering there are laws around data privacy and government institutions that enforce them. Are they perfect? Of course not, but it sure is better than no laws to protect my personal data.

danparsonson1y ago

https://en.m.wikipedia.org/wiki/General_Data_Protection_Regu...

ddxv1y ago

I think they put the new model behind a $40 paywall so less people use it. The model seems only marginally better than open source models, based on xAI's own internal tests, and they spend $$$ money for it to run. Elon talked in the second half about making one of the largest GPU data centers just to get this running. I guess the next iteration they'll be trying to reduce the costs.

Also, they will be open sourcing Grok 2, which is probably pretty behind at this point, but will still be interesting for people to check out.

adamhartenz1y ago

They should have asked Grok3 how to create a good announcement stream before going live. That was a mess

ensocode1y ago

What are your first impressions using it? (Not available in Europe currently). Is it a game-changer?

weberer1y ago

>Not available in Europe currently

I hate how its the same story for every new AI technology. If someone can tell me who to vote for or where to protest to change this awful EU law, that would be great.

iteratethis1y ago

It's not an awful law.

The Digital Market Act is a bit of an overreach but the AI law is not.

It classifies AI into risk categories, so that it doesn't kill anyone, carelessly handle sensitive information, etc.

A chatbot can easily comply with it.

superflow1y ago

totally agree. And this is one of the reasons the EU is falling more and more behind, all the silly regulations.

2 more replies

maelito1y ago

> for every new AI technology.

Well no. Mistral.ai

pjc501y ago

Have you tried asking the AI people to ship an AI that complies with EU law?

1 more reply

phatfish1y ago

You could move to America and avoid the fake delays blaming the regulations tech companies don't like.

Lucasoato1y ago

Companies need to adhere to GDPR in order to enter the European market, poeple have the right to request to delete their PII. It's a good law, actually this should be applied everywhere. As an European, though, I'm scared: what if companies are actually testing if excluding us is so much of a problem for their business?

2 more replies

littlestymaar1y ago

It has nothing to do with the EU laws, or at least not in the sense they want you to think about it: no laws prevents AI players to release their AI models here, but they are all also big tech players who are affected by GDPR, DSA and DMA which harm their business by protecting the consumers.

That's why they use their AI products as a leverage to turn European people against the laws that protect them from big tech. It's just blackmail.

1 more reply

pkkkzip1y ago

No, it was underwhelming, failing basic coding tasks, OCR/Image recognitions that none of the other existing models screw up.

modeless1y ago

I am excited for the voice mode promised in "a week" or so. ChatGPT Advanced Voice has been a big disappointment for me. It can't do some of the things they demoed at the announcement. It's a lot dumber than text mode. I find the voice recognition unreliable. I couldn't get it to act as a translator last time I tried. But most of all I find I don't have much to talk to it about. If Grok 3 voice mode can discuss current events from the X timeline then it should be much more interesting to talk to.

1 more reply

designov1y ago

Very impressive work given the timeline

podobo1y ago

Say what you will about the guy, he kept the training running on time.

mint21y ago

So he took credit for improvements others worked on and they also weren’t as good as purported?

(Assuming that is a reference to the Mussolini quote.)

sgerenser1y ago

OK that's actually a pretty good one. If you didn't steal that from an X comment, I give you props.

UltraSane1y ago

Why can so many people not see it?!

mirekrusin1y ago

Love low budget on marketing side, just few guys talking about essence - job done, tons of money saved if you ask me.

keepamovin1y ago

The most fascinating part of the video for me was how they built the hardware to do this: https://youtu.be/AUAJ82H12qs?si=sHz3ddZnz2-HU3UL&t=2192

geor9e1y ago

Launched where? https://x.com/i/grok just loads Grok 2. I assume it's only accessible from iOS right now?

waynenilsen1y ago

grok.com

1 more reply

zb31y ago

I'm a freeloader and it appears that unfortunately Elon is not stupid enough to just give it to me for free.. There's no fair price either since I see no pay-per-use pricing, so.. unavailable for me for now.

drusepth1y ago

What makes pay-per-use pricing inherently more or less fair than unlimited usage from a subscription?

1 more reply

92834092321y ago

I wonder if people will attempt at jailbreaking this model to see if they can find evidence of federal data being used to train it.

mnewme1y ago

Musk already has too much power, won’t trust him with my AI conversations

piperly1y ago

But you trust Google, OpenAI, and whatnot with it?

marticode1y ago

Between your average corporate megacorp and a drugged-out antivax fascist, I'll take my chances with the former yes.

1 more reply

Keyframe1y ago

Been using Google for email for the past 20 years. So far, so good.

procaryote1y ago

Trust is a strong word, but there are levels of hell

archagon1y ago

A DeepSeek local model is the only thing worth trusting anymore. Fuck them all.

greatgib1y ago

Billions spent, one of the most powerful AI developed, and still no one competent enough to trim the 15 mins of waiting time filler at the beginning of the announcement video...

vachina1y ago

Tells me they have spent their entire engineering time on engineering and zero on marketing fluff, which is good.

2 more replies

throw161803391y ago

I'm guessing Musk wanted it that way.

lngnmn21y ago

Anyone else noticed anything?

https://lngnmn2.github.io/articles/grok3/

sunaookami1y ago

They will open-source Grok 2 when Grok 3 comes out. Also it seems like it will be paywalled - disappointing considering DeepSeek-R1 is free and open source.

aprilthird20211y ago

Yeah not sure what profit these guys think they'll be able to squeeze out of these models with open source and free clearly being 95% as good

ks20481y ago

Having the keys to the treasury department will probably help.

imjonse1y ago

Exclusive contracts with the defense industry or similar deals?

1 more reply

srid1y ago

For some ouroborus fun, I attached this whole HN discussion and asked Grok 3 to summarize (with specific focus on the members attitude towards Elon Musk). Here's what it came up with:

https://x.com/i/grok/share/CTDC0WOi7RCbEDrm11AJ3PtLM

s1artibartfast1y ago

How did you customize the output?

1 more reply

arj1y ago

Still no post on their official blog. How disappointing.

phtrivier1y ago

Off topic, but just in case: is there a good reference on how people actually use LLMs on a daily basis ? All my attempts so far have been pretty underwhelming:

* when I use chatbots as search engines, I'm very quickly disappointed by obvious hallucinations

* I ended up disabling github copilot because it was just "auto-complete on steroids" at best, and "auto-complete on mushrooms" at worst

* I rarely have use cases where I have to "generate a plausible page of text that statistically looks like the internet" - usually, when I have to write about something, it's to put information that's in my head into other people head

* I'd love to have something that reads all my codebase and draws graphs, explain how things work, etc... But I tried aider/ollama, etc.. and nothing even starts making sense (is that an avenue to persevere in, though ?)

* At once, I tried to write in plain english a situation where a team has to do X tasks, in Y weeks, and I needed a table of who should be working on what for each week. I was impressed that LLMs were able to produce a table - the slight problem was that, of course, the table was completely wrong. Again, is it just bad prompting ?

It's an interesting problem when you don't know if you're just having a solution in search of a problem, or if you're missing something obvious about how to use a tool.

Also, all introductory texts about LLMs go into many details about how they're made (NNs and transformers and large corpuses and lots of electricity etc...) but "what you can do with it" looks like toy examples / simply not what I do."

So, what is the "start from here" about what it can really do ?

bhl1y ago

I use it everyday.

For coding, I use cursor composer to gather context about the existing codebase (context.md). Then I paste that into DeepSeek R1 to iterate on requirements and draft a high level design document, maybe some implementation details (design.md).

Paste that back into composer, and iterate; then write tests. When I'm almost done, I ask composer to generate me a document on the changes it made and I double check that with R1 again for a final pass (changes.md).

Then I'm basically done.

This is architect-editor mode: https://aider.chat/2024/09/26/architect.html.

I've found Cursor + DeepSeek R1 extremely useful, to the point that I've structured a lot of documents in the codebase to be easily greppable and executable by composer. Benefit of that is that other developers (and their composers) can read the docs themselves.

Engineers can self-onboard onto the codebase, and non-technical people can unstuck themselves with SQL statements with composer now.

trash_cat1y ago

Correct me if I am wrong, but the whole premise is of Cursor and Windsurf is that this architect-editor mode already being built in into the editor. This is why there is a distinction between composer (Editor) and chat function (arhitecture).

1 more reply

RobinL1y ago

This sounds great - would love to hear a little more about the prompts. Are you literally just asking 'write me a context.md that explains how feature x works' or something like that?

1 more reply

tmikaeld1y ago

This is the way.

Seriously, this is the only useful flow I've found for AI coding in general..

phtrivier1y ago

Cursor is not yet an option for me, but at least it means aider is not a dead-end. Thanks for the info.

1 more reply

djaychela1y ago

My wife has found ChatGPT extremely useful when dealing with her mother - who has bipolar and is obsessed with other people's health. I've got a terminal cancer diagnosis and handling my mother-in-law has been extremely difficult - nearly to the point of no longer having any communication with her. ChatGPT has a single conversation with all the back story and has put some useful points across when discussing how difficult her behaviour has been (she watched a operation that failed for me for entertainment, for instance).

I have found similar when giving backstory and needing help to start structuring difficult conversations where I want to say the right thing but also need to be sensitive.

procaryote1y ago

I'm sorry for your situation.

> she watched a operation that failed for me for entertainment, for instance

You make your own choices, but cutting a person like this off would be very reasonable

1 more reply

pjc501y ago

Man. :( We worry about the AI being inhuman, but robotic meaningless pleasantry is in some cases a significant upgrade from human cruelty.

panphora1y ago

You might find this optimal conversation path finder app of interest: https://x.com/eddybuild/status/1889908182501433669

jaggederest1y ago

Here's some things I have in my chatgpt history:

- Discussing the various stages of candymaking and their relation to the fundamental properties of sugar syrups, and which candies are crystalline vs amorphous. It turns out junior mints are fudge. Fondant is really just fudge. Everything is fudge, my god.

- Summarizing various SEC filings and related paperwork to understand the implications of an activist investor intervening in a company

- Discussing the relative film merits of the movie Labyrinth and other similar 80s kitsch movies. ChatGPT mentioned the phenomenon of "twin films" which was an interesting digression.

- Learning about various languages Tolkien invented and their ties to actual linguistics of natural languages and other conlangs

- Some dimensional analysis of volumes, specifically relating to things like "how many beans are in the jar" estimation and what the min and max of a particular weight of coins might be valued, in terms of both a par value based on a standard coin mix and outliers of, for example, old dimes that are pure silver.

- Discussion of quines in prolog and other languages, which resulted in a very interesting ChatGPT bug where it started recursing and broke when trying to write a prolog quine.

- Back of the envelope economic calculations around the magnitude of the housing deficit and the relative GDP cost for providing enough housing quickly enough to make an impact. Spoiler: it's probably unreasonably expensive to build enough houses to bring down housing prices by any significant degree, and even if we wanted to, there's not enough skilled workers.

- A number of podcasts transcribed. (I hate audio and meandering, so transcribed and summarized is perfect) I could use whisper and a python script to do this, but I'd rather let ChatGPT do the legwork, and it actually used a more modern model and method of processing than I would have naively used.

I find Github Copilot to be a really great autocomplete. I frequently write the comment at the top of a function and hit tab and it writes the whole function. This is dependent on typescript and having a relatively standard codebase but I think those things are useful on their own. You really have to limit it in terms of scope and specifics, but it lets me think high level instead of worrying about syntax.

danparsonson1y ago

> Everything is fudge, my god

Fudge is made with milk - am I missing a joke?

1 more reply

namaria1y ago

The only plausible explanation for the amount of resources poured into these language models is the hope that they somehow become the origin of AGI, which I think is pretty fanciful.

I can feel the cold wind of the next AI winter coming on. It's inevitable. Computers are good at emulating intelligent behavior, people get excited that it's around the corner, and the hype boils over. This isn't the last time this will happen.

guax1y ago

I think the amount of money is explained in part by hubris. People in high positions think they're at least what they earn more smart and capable than people at the bottom of the org. So its reasonable, expected, borderline obvious that a computer bot can replace that person. So you're betting on the ability of it to get rid, if not of your junior devs at least the majority of your customer support staff.

In reality people doing "menial" jobs are smart and learn and operate with a lot of nuance than people ignore given unfamiliarity or just prejudice. Do you prefer to talk to a chatbot or a real human when you have a problem, how confident are you really, that even if the bot knows what the problem is it would be able to solve it.

Lots of problems with customer care is anchored in the issue that support staff is not allowed to fix or resolve problems without escalation or attempts at keeping you from costing more money. The bot might be better at it for the company because it will frustrate you enough to give up that 30 bucks refund, idk.

Ai seems to change a lot the dynamics of corporate jobs but I haven't seen yet anything that would be a game changer outside of it. Its great for searching company unorganised and messy knowledge bases.

antupis1y ago

I think this still applies https://x.com/dwarkesh_sp/status/1888164523984470055, LLMs now are useful but we need something else for AGI.

2 more replies

Cemlolo1y ago

I can't feel any cold right now at all.

On all corners people work on so many small pieces advancing what we have.

And plenty of obvious things are not here yet like a full local dev cycle aka ai uses the IDE to change code them executes it, fixes compiler issues and unit tests and then opens a pr.

Local agents/ agents having secure and selected access to our data like giving my agent read only access to my bank account and a 2factor way for it to send money.

Deepseek's reinforcement learning is also a huge new lead.

Nonetheless in parallel robots are coming too.

GenAI is getting better and better. Faster and better video and cheaper. 3D meshes, textures first GenAI ads

olalonde1y ago

I predict this comment will age very, very poorly. Bookmarked.

2 more replies

Workaccount21y ago

I think the snag I feel in your argument comes from

>Computers are good at emulating intelligent behavior

Which implies that the brain is some kind of transcendent device that can backdoor physics to output incredible intelligence unique to it's magical structure.

Maybe LLMs aren't the key, but as far as we can tell the brain is also just another computer.

1 more reply

gordon_freeman1y ago

Everyone seems to have a different definition for AGI. Is there some kind of standard there?

1 more reply

iamnotagenius1y ago

precisely; however this time we will have tangible results from the ongoing AI summer; that would be generative art, and coding/writing/journalist assistants.

1 more reply

sanxiyn1y ago

Many people replied with anecdotes, but recently Anthropic published analysis of claude.ai production traffic. As far as I can tell this is the single best currently existing source of "how people actually use LLMs". Everything else you can't be sure whether it is representative.

https://www.anthropic.com/news/the-anthropic-economic-index

1 more reply

zamalek1y ago

I use them as a springboard for things I am really unfamiliar with. I'm self-learning electronics at the moment, and so I can ask it things like "what's a common and widely available cooperator." You will not find that answer on a search engine, I don't care how good your Google fu is.

It's a weak jack of all trades: it knows a fair amount about the sum of human knowledge (which is objectively super-human), but can't go deep on any one thing, and still seriously lags behind humans in terms of reasoning. It's an assistant that all book smarts and no street smarts. Or maybe: it's a search engine for insanely specific things.

Rote work, as well. Things like porting an enum from one programming language to another: past the source language into a comment and start it off with one or two lines in the target language. Dozens of tabs are surely faster than manual typing, copy paste, or figuring out vim movements/macros.

Workaccount21y ago

Heads up as an EE who uses LLMs quite a bit; they cannot analyze circuits or build them.

They might be able to help stitch together modules (like sensor boards plugged into microcontrollers) and definitely can write code to get things going, but they fall flat on their face hard for any kind of bare bones electronics design. Like 5% success rate and 95% totally incorrect hallucinations.

2 more replies

brulard1y ago

I ordered some electronic components / sensors from china, and as it took months to arrive, I forgot exactly what I ordered (I'm noob at this). Simply taking a picture and asking Claude what it was helped a lot. The numbers and letters printed on the components didn't yield relevant results on google.

mettamage1y ago

Your experience matches mine.

card_zero1y ago

That's a mistake for "comparator", isn't it. You've allowed the AI to train you to use the wrong word through a shared delusion, haven't you.

Edit: if anybody knows otherwise, show me some evidence, don't just downvote. If these things are widely used, why are they impossible to find by searching? Why doesn't this electronics site know about them:

https://www.eeeguide.com/?s=cooperator

Why aren't they in any books when I did a full-text search on archive.org? Why doesn't Wikipedia know about them? Why aren't there threads about them on electronics forums?

I found them (through an image search) in exactly one place: educational training boards made in India by Jainco, like this one:

https://jaincolab.in/delta-modulation-and-demodulation

But this other one talks about a "ramp comparator" and then repeats the phrase but using "ramp cooperator" instead.

https://www.jaincolab.com/firing-circuit-using-ramp-comparat...

So I surmise it's an error and not a real thing.

2 more replies

jfim1y ago

I've found that Claude has been pretty decent at writing boilerplate code.

For example asking it something like "I have an elixir app that is started with `mix ...` can you give me a Dockerfile to run it in a container?"

It can also do things like "Given this code snippet, can you make it more Pythonic" or even generate simple apps from scratch.

For example, a prompt like "Can you write me a snake game in HTML and JavaScript? The snake should eat hot dog emojis to grow longer." will actually generate something that works. You can see the generated code for that prompt at https://claude.site/artifacts/34540f88-965e-45ca-8083-040e30...

Following up with "Can you make it so that people can swipe on mobile to control the snake?" generates https://claude.site/artifacts/651e957a-9957-488c-ae6b-e81348... which is pretty good IMO for 30 seconds of effort.

It also has a surprisingly competent analysis mode where you can upload a CSV and have it generate charts and analyze the data.

It's not perfect, it'll sometimes get confused or generate some dubious code, but you can quickly get to a 90% good solution with 1% of the effort, which is pretty impressive IMO.

richrichardsson1y ago

> I ended up disabling github copilot because it was just "auto-complete on steroids" at best

this is good enough sell for me, and it's like sub 1-in-50 that it's "auto-complete on mushrooms" (again my experience, YMMV).

An awful lot of the time, my day to day work involves writing one piece of code and then copy-pasting it changing a few variable names. Even if I factor out the code into a method, I've still got to call that method with the different names. CoPilot takes care of that drudgery and saves me countless minutes per day. It therefore pays for itself.

I also use ChatGPT every time I need some BASH script written to automate a boring process. I could spend 20-30 minutes searching for all the commands and arguments I would need, another 10 minutes typing in the script, another 10-20 minutes debugging my inevitable mistakes. Or I make sure to describe my requirements exactly (5-10 minutes), spend 5 minutes reviewing the output, iterate if necessary (usually because I wasn't clear enough in the instructions).

3-5x speed up for free. Who's not going to take that win?

owenpalmer1y ago

My biology professor provides basically zero feedback on his student's understanding of the material. There are very few practices questions to prepare for exams, which are worth 40% of your grade. I had an LLM write some python that extracts the relevant textbook chapters, which then I can feed into an LLM to generate practice questions. Then I can ask the LLM for feedback and whether or not I'm articulating the answers correctly.

Nition1y ago

I reckon the ideal use case for chat LLMs at the moment is as a bridge for questions that are hard to search but easy to verify.

For example, you have a plant you can't identify. Hard to Google search with words. "Plant with small red berries and...". You could reverse image search your photo of it, probably won't help either. Show an LLM the photo (some accept images now). LLM tells you what it thinks. Now you Google search "Ribes rubrum" to verify it. Much easier.

You've got a complicated medical problem that's been going on for months. A google search of all the factors involved would be excessively long and throw up all sorts of random stuff. You describe the whole scenario to an LLM and it gives you four ideas. You can now search those specific conditions and see how well they actually match.

I've found there are actually a lot of questions that fit in that sort of NP complexity category.

krige1y ago

As a side note, there's an app for that! (tm). PlantNet does recognize plants based on photo provided and it is doing a pretty good job at it. It predates the LLM craze by a bit.

mplanchard1y ago

The Seek app (by iNaturalist, another excellent app) also can identify plants based on a photo, and without the need for an internet connection, which is a critical feature IMO since you often want it when you’re out walking in the woods or whatever.

qingcharles1y ago

I use LLMs significantly on a daily basis, mostly for coding C#, HTML, CSS, SQL. I use them for researching for wiki articles. I use it for summarizing long web pages and science papers. I use them for translation. I used GPT last night to repair my furnace (I've never opened a furnace before).

It (mostly) exceeds and excels at every task I use it for. I'm rarely disappointed. YMMV.

Absolutely life-changing for me.

jiggawatts1y ago

I think most people are still "holding them wrong", and it'll take an entire generation of people to really figure out what these things are and are not good for.

I'll give two recent use-cases that may provide a hint of their ultimate utility:

1) I've been modernising 2010-era ASP.NET code written by former VB programmers that looooved to sprinkle try { ... } catch( Exception e ) { throw e; } throughout. I mean thousands upon thousands of instances of these pointless magical incantations that do nothing except screw up stack traces. They probably thought it was the equivalent of "ON ERROR RESUME NEXT", but... no, not really. Anyway, I asked ChatGPT in "Reasoning" mode to write a CLI tool utilising the Roslyn C# compiler SDK to help clean this up. It took about three prompts and less than an hour, and it spat out 300 lines of code that required less than 10 to be modified by me. It deleted something like 10K lines of garbage code from a code base for me. Because I used a proper compiler toolkit, there was no risk of hallucinations, so the change Just Worked.

2) I was recently troubleshooting some thread pool issues. I suspect that some long-running requests were overlapping in time, but Azure's KQL doesn't directly provide a timeline graphical view. I dumped out the data into JSON, gave ChatGPT a snippet, and told it to make me a visualiser using HTML and JS. I then simply pasted in the full JSON dump (~1 MB) and ta-da instant timeline overlap visualiser! It even supported scrolling and zooming. Neat.

marcuschong1y ago

Last night I was about to start working on a lot of text I need to submit my startup to a government funding program. Questions like "describe what your startup does", "describe your market" and things like that. Tens of fields which I estimated would take me a week and a half to do it right, if I wasn't going to pause all my other activities.

Then I had a better idea: I spent 20 minutes baby wearing, walking and dictating everything about my startup to ChatGPT. Later I took all that text and labeled it as a brain dump, plus my product support portal and some screenshots of my marketing material. Gave it all to ChatGPT again and asked it to answer each of the questions in the form. That's it. I have a pretty good version 1 which I can revise today and be done with it.

Many, many hours saved. I have tens of examples like that.

The product documentation I provided it with was also created with the help of GPT, and that saved me even more time.

sanswork1y ago

Autocomplete on steroids is what I use it for. I've recently started using Cursor and the productivity improvements have been huge. I won't let it write very large blocks of code but I do a lot of web stuff so being able to update the classes in one spot and have it recognise all the other places that might be helpful and let me just tab through. Code to test things it's pretty good at as well which also saves a lot of typing.

jagermo1y ago

For me, getting summaries of meetings is my favorite use case. Saves me from taking notes and I can extract next steps.

It also helps me getting started with new content, kind of building the scaffolding of, say, a blog or social post. It still needs adaption and fine-tuning, but getting rid of a white page is a great help for me.

And I use LLMs to play through ideas and headlines. I would normally do this with other humans, but since working full remote, its a nice sparing partner, although the AI not being able to really give criticism is a bit annoying.

The tools also make it easier to write in English as a non-native, making sure my text does not include any false friends or grammar errors.

Yizahi1y ago

Meeting summaries is the most hilarious thing these neural networks have produced. I don't know which NN model does Zoom use, but the text it produces is super funny :) . It basically can't parse half of the words, and then generate a random sentences using the remaining ones.

1 more reply

theshackleford1y ago

> although the AI not being able to really give criticism is a bit annoying.

I’ve managed to get ChatGPT to a good place in this regard recently and it’s better for it. In fact, it’s been doing such a good job recently that it almost seems like…human like.

I’ll have to look at my prompts, but somehow I got it from licking my ass and telling me everything I say and do is amazing to a point now where it almost seems eager to tell me I’m wrong.

Which is fantastic, huge improvement. I don’t really use it for coding though, because I am not a programmer. I would have no means today to correctly evaluate 90% of what it would return me.

magicalhippo1y ago

I use them as an alternative to search engines for topics where I have some specific question where traditional search engines fail to find the needle in the haystack.

As a concrete example, I was recently playing with simulating the wave equation, and I wanted to try to use a higher-order approximation as I had never done that before. I'm quite rusty as I haven't done any numerical work since university some decades ago.

I still recalled how to deal with the Neumann boundary conditions when using the traditional lower-order approximation, but I was uncertain how to do it while keeping the higher-order approximation.

Searching for "higher-order neumann boundary conditions wave equation" or similar got me pages upon pages of irrelevant hits, most of them dealing with the traditional approximation scheme.

So I turned to ChatGPT which right away provided a decent answer[1], and along with a follow-up question gave me what I needed to implement it successfully.

[1]: https://chatgpt.com/share/67b4ab43-6128-8013-8e5a-3d13a74bf6...

nomilk1y ago

This video's pretty great: https://www.youtube.com/watch?v=uRuLgar5XZw

One thing I can't figure out how to get LLMs to do is truely finish work. For example if I have 100 items that need xyz done to them, it will do it for the first 10 or so and say ~"and so on". I have a lot of trouble getting LLMs to do tasks that might take 10 mins - 1h. They always seem to simply want to give an example. Batch processing is the answer, I guess, or perhaps more 'agentic' models/tools - but I wonder if there are other ways.

phtrivier1y ago

Thanks for the link.

(Other answers are people gathering examples, which is nice, but I'm looking for more structured things.

And, I suppose I could ask an LLM, but my main problem is that... I don't really _trust_ LLMs yet :D )

maciekpaprocki1y ago

finally, after two years of hype I have some usage for llm.

we import descriptions of products from a seller. the problem is they are mental ( probably written by chatgpt :)) and are way too long. we need only small blurb.

I put our style guide and given text to chatgpt and I get somehow reasonable description back. then editors still need to check it, but it's way less work.

phtrivier1y ago

I've seen a comic at the beginning of the LLM hype where:

* in panel A, some guy is proud to use ChatGPT to turn 3 lines of text into a 10 pages report

* in panel B, some girl is happy to use ChatGPT to summarize the 10 pages report into 3 lines

It was meant to be _satire_, not the sales pitch ;)

1 more reply

somenameforme1y ago

I find them useful for searching for some function or API name with natural language. 'What's the function call [in blah] that generates a quaternion from a couple of vectors?' type stuff. Not exactly inspiring but I've found it highly useful. If you try to search for something like that online (and somebody hasn't asked the exact question on e.g. stack overflow) you'll just end up getting all the documentation for quaternions, vectors, and blah - when the function itself might even be in a tertiary math library.

staticman21y ago

These probably aren't tasks you need done but;

LLMs are pretty good at translation between human languages which makes sense since they are language models after all. They are better at this any any other technology.

The state of the art image ones can also probably do OCR and handwriting recognition better than any other software though may be expensive to run in large volume. But if you need to take picture of a notebook page with your camera phone an LLM can quickly OCR it.

iamnotagenius1y ago

Not quite true; LLMs are very expensive to run; bert or other tranformer specfically built for translation can be cheaper to run.

1 more reply

concordDance1y ago

Big emuse cases for me are:

1. Exploring a new domain and getting some terms I can google for.

2. Making small scripts to do things like query github's GraphQL API.

3. Autocomplete of code using copilot.

pmvpeter1y ago

I use it daily for all sorts of things, but one of the most interesting uses for me so far has been self-reflection.

For example, in the beginning of this year, I completed this exercise where I wrote a lot about childhood, past experiences, strengths and weaknesses, goals and ambitions for the future, etc (https://selfauthoring.com) and then I uploaded all that to ChatGPT, asked it to be my therapist/coach, and then asked it to produce reports about myself, action plans, strategies, etc. Super interesting and useful.

By now ChatGPT has quite a bit of context from past conversations. Just the other day I used this prompt from someone else and got back super useful insights – "Can you share some extremely deep and profound insights about my psyche and mind that I would not otherwise be able to identify or see as well as some that I may not want to hear"

liampulles1y ago

I don't use it daily, and I find copilot counterproductive (for me). I do try to experiment with chatgpt when I remember to.

I find it good for complex SQL, reviewing emails, and Godot assistance (I'm a beginner game Dev).

There are also times when I have programming questions and I might try to use chatgpt, with mixed results.

Our company has tried to integrate it into one of our products, and I find it troubling how on occassion it is confidently giving bad results, but my concern seems to be in the minority.

EDIT: there was also a large refactor I did recently which involved lots of repeatable, but not super regexable, changes - chatgpt forgot where it was as I went through it, but other than working around that it was very useful.

brulard1y ago

In the last few days I discovered it's good at medium-complexity SQL, not at really more complex ones. I'm struggling like 4th day with Claude, ChatGPT, Gemini and Deepseek. All could do some good analysis with some low-hanging fruit improvements, but all went completely crazy when trying to optimize more complex things, getting into loops proposing the same changes over and over, outputting invalid SQL, and gemini even forgot what we were doing, asking me if I could paste again the query i included in the very first message. Maybe the chain of thought models would handle this better, but I believe I hit the limit for the standard ones.

1 more reply

davedx1y ago

I use mine as if it's an infinitely patient, relatively competent junior/medior level developer that I constantly give small chunks of programming to do (typically a function at a time), and occasionally consult on architecture/design/other things.

I don't use integrated coding tools, so my workflow isn't super fast, but that's not what I'm really aiming for - more that I want to save my brain's energy from low level drudgy boilerplate or integration code, so I can focus it on the more important decisions and keep business-side context in my head.

It's been a huge help for me this way across multiple projects in multiple domains.

CrimsonRain1y ago

I've coded a full custom deployment system (config, create, update, cert manege and much more) in full bash using nothing but ChatGPT. I didn't write a single line of bash.

I did write 50 or more lines of instructions on what needs to be done and in what order.

ChatGPT gave me 5/6 (I asked for this) bash scripts totalling 300+ lines that seamlessly work together.

After reviewing, I asked it to change a few places.

If any human tried the same (except those rare bash Gods), it'd take many hours. I think it took me less than 30 minutes.

mtaras1y ago

The Vergecast recently did a section where they asked listeners what they use LLMs for (specifically not for coding) https://youtu.be/WwNjBNtZ3Co 30 minutes starting at 45:25, it had a number of interesting examples. Might not convince you of LLM's excellence, or might not be much different from what other people commented, but it's a good listen nonetheless.

yodsanklai1y ago

I used ChatGPT all the time for

1. Small coding tasks ("I want to do XYZ in Rust"), it has replaced stack overflow. Very convenient when writing code in a language I'm not super familiar with. 2. Help with English (traduction, proofreading...) 3. Learning something, like tech, I like interacting with it by asking questions, it's more engaging than just reading content.

I'd say nothing is game changing, but it's a nice productivity boost.

tkgally1y ago

The sister comments contain quite a few specific examples. But the many back-and-forth arguments here on HN about whether LLMs are useful for coding suggest that understanding how they might or might not be used may be the biggest challenge at this point.

I myself use them a lot, though I constantly feel that I would be able to get more out of them if only I were smarter.

wobfan1y ago

I feel that I would be smarter if I wouldn’t use them constantly.

1 more reply

lm284691y ago

> All my attempts so far have been pretty underwhelming:

Same, it's good for repetitive things, things that have been answered 1000 times on stack overflow, translations, but that's about it. If you work on anything remotely new/hard it's mostly disappointing, you have to babysit it every step of the way and rewrite most of what it's shitting out in the end anyways.

I think it just made it obvious that 90% of tech jobs basically amount to writing the same CRUD thing over and over again & mobile/web apps with very common designs and features.

NoboruWataya1y ago

I admit to having been an LLM sceptic from day one, but I have been using ChatGPT and Claude a fair bit to try and figure out what the hype is all about. I haven't really succeeded.

Most recently I tried to use them both to solve a programming problem that isn't well documented in the usual channels (Reddit, StackOverflow, etc) and found it to be quite a disappointing and frustrating experience. It just constantly, enthusiastically fed me total bullshit, with functions that don't exist or don't do what the LLM seems to "think" they do. I'm sure I'm just "holding it wrong" but my impression at this stage is that it is only capable of solving problems that are trivially solvable using a traditional search engine, with the added friction that if the problem isn't trivially solvable, it won't actually tell you that but will waste your time with non-obvious wrong answers.

I did have a slightly more positive experience when asking it about various chess engine optimisation algorithms. I wasn't trying to use the code it generated, just to better understand what the popular algorithms are and how they work. So I think they might work best when there is an abundance of helpful information out there and you just don't want to read through it all. Even then, I obviously don't know what ChatGPT was leaving out in the summary it provided.

scotty791y ago

Try to give it a try ask you'd expect junior developer to successfully finish.

geros1y ago

Cooking & Meal Planning:

- I have these three ingredients; recommend Italian main courses.

- What other ingredients pair well with this?

- How can I "level up" this dish if I want to impress?

- Can I substitute X for Y?

- Generate a family-friendly meal with lots of veggies using leftover roast chicken.

nmeofthestate1y ago

I just used chat gpt to summarise a HN post about it taking unexpectedly long to install a washing machine because of unexpected turns of events, and this being analogous to software development. It was a time-saver.

imgabe1y ago

I think it excels when you know enough to precisely describe what you want but you don’t know enough about the details of the language or framework you’re using to implement what you want.

lightandlight1y ago

Here are some of my experiences:

* Figuring out where to start when learning new things (see also <https://news.ycombinator.com/item?id=43087685>)

  One way I treat LLMs is as a "semantic search engine". I find that LLMs get
  too many things wrong when I'm being specific, but they're pretty good at
  pointing me in a general direction.

  For example, I started learning about OS development and wanted to use Rust. I
  used ChatGPT to generate a basic Rust UEFI project with some simple
  bootloading code. It was broken, but it now gave me a foothold and I was able
  to use other resources (e.g. OSDev wiki) to learn how to fix the broken bits.

* Avoiding reading the entire manual

  It feels like a lot of software documentation isn't actually written for real
  readers; instead being a somewhat arbitrary listing of a program's features.
  When programs have this style of documentation, the worst case for figuring
  out how to do a simple thing is reading the entire manual. (There are better
  ways to write documentation, see e.g. <https://diataxis.fr/>)

  One example is [gnuplot](http://www.gnuplot.info/). I wanted to learn how to
  plot from the command line. I could have pieced together how to do it by
  zipping around the
  [gnuplot manual](http://www.gnuplot.info/docs_5.4/Gnuplot_5_4.pdf) and building
  something up piecewise, but it was faster to instruct Claude directly. Once
  Claude showed me how to do a particular thing (e.g. draw a scatter plot with
  dots intstead of crosses) I then used the manual to find other similar
  options.

* Learning a large codebase / API

  Similar to the previous point. If I ask Claude to write a simple program using
  a complex publicly-available API, it will probably write a broken program, but
  it won't be *completely* bogus because it will be in the right "genre". It
  will probably use some real modules, datatypes and functions in a realistic
  way. These are often good leads for which code/documentation I should read.

  I used this approach to write some programs that use the
  [GHC API](https://hackage.haskell.org/package/ghc). There are hundreds of
  modules, and when I asked Claude how to do something with the GHC API it wrote
  relevant (if incorrect) code, which helped me teach myself.

* Cross-language poetry translation

  My partner is Chinese and sometimes we talk about Chinese poetry. I'm not very
  fluent in Chinese so it's hard for me the grasp the beauty in these poems.
  Unfortunately literal English translations aren't very good. We've had some
  success with asking LLMs to translate Chinese poems in the style of various
  famous English poets. The translation is generally semantically correct, while
  having a more pleasing use of the English language than a direct translation.

TomK321y ago

I like having fun with them, like by asking grok whether some Elon Musk tweet is true. Usually it replies with a lengthy answer and I then force it to answer with Yes or No. Even more fun when drilling it more to load a few more details into it's brain, and then ask the first question again, with a Yes/No only, it sometimes does change its answer. I do wonder, has Grok already joined the resistance against Musk?

linguistbreaker1y ago

Start from here :

Stop using Google search and use an AI. No more irrelevant results, no more ads. No more slop to wade through.

BTW I find Claude is great at making graphs and diagrams. If you pay ($20) you can hook it up to a local code base.

2 more replies

tallanvor1y ago

Honestly, the main thing I've found ChatGPT to be useful for in my daily life is helping to translate what I write from my native language to the language spoken by most of the people where I live. But even then it only really works if you have at least a basic understanding of the language and can ask it to rewrite sections when you recognize poor word choices or awkward phrasing.

scotty791y ago

It helps to split what you are translating into 1-2 paragraph chunks and feed it one by one.

poulpy1231y ago

LLM are good at one thing, and totally by chance it is the thing they have been designed to be: be a word probability generator. If you can constrain your usage around that, they are great to use. But the people who think they can reason or know some kind of truth are delusional

esafak1y ago

Explain how o3 won a gold medal at this year's International Olympiad in Informatics, or provide your benchmark for reasoning.

1 more reply

Al-Khwarizmi1y ago

I use it for lots of stuff where I'm not an expert, or that are low stakes. I don't use it for the "core" of my job, but there are many things that are not "core" and still eat up a lot of time, in fact, most of my workday would be in this category. Some typical examples from my daily life as a university professor:

- Writing Python scripts to make charts out of Excel sheets, and then refine them. I could do it myself, but I would need to learn a library like Seaborn or similar which honestly is not especially intellectually stimulating, and then spend nontrivial amounts of time iterating on the actual code. With LLMs it's a breeze.

- Working with cumbersome LaTeX formatting, e.g. transposing a table, removing a column from a table, etc.

- Getting the tone just right in a professional email written in English to someone I don't know much (I'm not a native speaker so this is not trivial).

- Finding resources on topics that are tangential to what I do. For example, yesterday I needed to come up with some statistics on English words for a presentation I'm preparing, and I needed a free corpus where I could search for an n-gram and get frequencies of next words. I don't usually work with that kind of resource, it was just a one-off need. I asked for corpora of that kind and got a useful answer instantly. The manual process would probably have implied going through several options only to find that I needed a license or that they didn't provide the specific statistics I needed.

- Brainstorming on titles for scientific papers, presentations, names of concepts that you introduce on a paper, variable names, etc.

- Shortening a sentence in a paper that makes me go over the page limit, or polishing the English in a paragraph.

- Summarizing a text if I'm kind of interested in knowing the gist but have no time to read it whole.

- Answering quick questions on basic things that I forget, e.g. the parameters to make a Linux folder into a tar.gz. Man is too verbose and it takes time to sort the wheat from the chaff, Google is full of SEOd garbage these days and sometimes you need to skim a lot to find the actual answer, LLMs are much faster.

- Writing bureaucratic boilerplate, the typical texts with no real value but that you have to write (e.g. gender perspective statement on a grant request).

- Coming up with exam questions. This is a rather repetitive activity and they're fantastic at it. At my place we also have two official languages and we need to have exam assignments on both languages, guess who does the translation now (respecting LaTeX formatting, which previous machine translation tools typically wouldn't do).

- As an example of a one-off thing, the other day I had to edit a Word document which was password-protected. I asked ChatGPT how to unlock it and it not only answered, but actually did it for me (after 3 tries, but still, much faster than the time it would have taken for me to find out how to do it and then actually do it).

These are just some examples where they contribute (greatly) to my productivity at work. In daily life, I also ask them lots of questions.

JTyQZSnP3cQGa8B1y ago

Companies have hijacked the open source concept to mean downloadable blob and we follow them as I see in the comments. It’s a real shame.

danielbln1y ago

I remember the NVIDIA Linux kernel binary blob driver discussions from the early-mid 2000s. Who knew we had an open source driver all along...

JimDabell1y ago

Something isn’t open-source because you get everything that went into making it. Something is open-source if you can change it (relatively) easily. The GPL and open-source definition both refer to “the preferred form for making modifications”. The preferred form for modification in the Nvidia driver’s case is the source code. The preferred form for modification in this case is the weights.

Open-source as a concept doesn’t really correspond well with LLMs but to the extent that it does, access to the training data is not required because that training data is not the preferred form for making modifications.

2 more replies

soraminazuki1y ago

Why was the free software movement a thing when Windows was open source all along, haha.

CTDOCodebases1y ago

In that case Linus needs to make a retraction.

1 more reply

palata1y ago

It's started with abusing the term "AI", I don't see a reason why they would not abuse "open source". I guess it's what happens to language when a concept becomes mainstream: people use it wrong, but if enough people do it, it becomes the new meaning?

But I agree, it's a real shame.

beeflet1y ago

I dislike when people like RMS get semantic and gatekeep words like "free software", but this is the end product of a world without gatekeepers. People just use words in a way that's convenient to them.

2 more replies

a-dub1y ago

this is correct. "open source" means everything required to recreate from scratch and improve. not "here's a massive binary, an interpreter script and permission."

userbinator1y ago

How can you even "open source" an AI model without all of the, presumably copyrighted and extremely voluminous, training data?

5 more replies

ncr1001y ago

Feeling my inner Klingon, "Where is the honor in releasing a binary blob and calling it .. open source. Pfah!"

llm_trw1y ago

Linux doesn't ship a compiler or CPU when you download it. It's not open source I guess.

guappa1y ago

I'm guilty of this. I didn't publicly shame a coworker who installed fb's model and said "it's open source" just for the sake of peace.

blackeyeblitzar1y ago

Most of these claimed “open” models are not open source. Some of them are open weights. But even some of the ones that share weights are not really open - they force a restricted license on you. To be open source I think they need to share training data and training code under an OSI approved license.

AI2 has a model called OLMo that is actually open source. They share the training data, training source code, and many other things:

https://allenai.org/blog/olmo2

They also released an app recently, to do local inference on your phone with a small truly open source model:

https://allenai.org/blog/olmoe-app

torginus1y ago

While I do agree with your point - I wonder what information companies could release that'd be immediately useful to you.

It's not like they understand what the weights mean either and if they released the code and dataset used to create it, you probably couldn't recreate it, owning the fact that you don't own tens of thousands of GPUs.

If a software's source is released without all the documentation, commit history, bug tracker data etc., it's still considered open source, yet you couldn't recreate it without that information.

liampulles1y ago

Thank you for pointing this out, I was not thinking clearly about this

raverbashing1y ago

Better a downloadable blob than a non-downloadable one

frabcus1y ago

No, it's not, as it means nobody is pushing for actually open models.

A truly open model has open code that gathers pre-training data, open pre-training data, open RLHF data, open RLAIF data generated from its open constitution and so on.

The binary blob is the last thing I'd want - as a heavy user of LLMs I'm actually more interested in the detail of what all training data is in full, than I am the binary blob.

3 more replies

aiono1y ago

Parent doesn't argue about that. How is this relevant?

nicce1y ago

This is the problem - we accept this approach and then they don't have to do any effort to correctly publish them in open means.

kortilla1y ago

Cool, that’s not open source though.

That’s like a chef giving you chicken instead of beef and calling it vegetarian.

1 more reply

aerzen1y ago

it's open weights

dmos621y ago

Weights-available. You wouldn't say open-binary.

ks20481y ago

It is? Do you have a link?

mmoustafa1y ago

it's closed source and open outcome

silisili1y ago

So is asking ChatGPT to write your application, then open sourcing said application IMO.

I see both sides here, but I don't think it's a hill worth dying on. The 'open source' part in this case is just not currently easily modifyable. That may not always be the case.

1 more reply

llm_trw1y ago

An llm isn't software any more than a matrix is.

What do you think an open source matrix should look like?

entropi1y ago

A compiled executable is not any less software than the source code. But the point of open source code is not the ability to see the CPU instructions though, is it?

Its about reproducibility and modifiability. Compiled executables (and their licences) lack that. The same as these downloadable blobs.

1 more reply

stonogo1y ago

Is this question in good faith? The way generated code and data should be open sourced is by releasing the tools and configuration used to generate it. There's never been much confusion around this, to my knowledge.

I'm not even necessarily advocating that these things should be released, but the term "open source" has a pretty well-understood meaning that is being equivocated here.

angusturner1y ago

Credit to the engineers that built this, but it fills me with rage that Elon has this sort of unchecked power.

How long before this starts getting deployed in safety critical applications or government decision making processes?

With no oversight because Elon seems to have the power to dismiss the people responsible for investigating him.

Anyone not scared by this concentration of power needs to pick up a book.

ijustlovemath1y ago

What's remarkable to me about criticism like this is how quickly it's rebutted by people claiming "where did they say they would do this," as if these people don't make incredibly rushed and poorly planned decisions all the time. It's like an idea immune system that rejects any criticism or self reflection. It would be sociologically fascinating if it wasn't being combined with a dereliction of congressional power and an unchecked executive.

2 more replies

InTheArena1y ago

Elon's unchecked power at building a model? Or at politics?

I always worry whenever I see people telling me how to feel - rage in this case. We are in a political system that is oriented more around getting people to feel rage and hatred as opposed to consensus and deliberation. Elon is the face of that, but it's a much longer and larger problem. Throw in the complete dismisal that anyone not scared of this is ignorant, shuts down discussion.

The problem I have with Elon is that they are wasting a once in a lifetime chance to actually address and fix systematic problems with the US government. Deploying LLMs in the government space doesn't fear me with dread. Continuing the senseless partisan drive of the 20 years does.

5 more replies

_heimdall1y ago

I'm not sure how LLMs/AI couldn't consolidate power. By design, it will move power from the individual to those running the AI systems.

4 more replies

cwalv1y ago

That was fast. Perhaps we should imagine this has nothing to do with Musk to avoid completely derailing the conversation

1 more reply

srid1y ago

I do not share your fear & anxiety. What concrete danger do you imagine will happen as a result of xAI? Try to be as concrete as possible.

Also, dang, is there anything we can do to keep the comments on this submission tech-focused? Perhaps the Elon-bashing political digression can be split into its own thread?

6 more replies

conductr1y ago

Or he starts using NSA data to train it, seems he has unchecked power to get into national systems and he made a point of saying this is more than the internet worth of knowledge...

finnthehuman1y ago

> it fills me with rage that Elon has this sort of unchecked power

I can empathize, but I can't feel indignant about it. Not any more.

For years and years I've watched people warn about the centralization of power by tech companies. They were shut down left and right. I'm not accusing you of being one doing the shutting down. I'm just annoyed that Elon is what it takes for people to start realizing the people arguing the principal might have been onto something.

And I expect to see them start getting their "I told you so" in. Watching this play out, I'm personally inclined to join team "you made your bed, now sleep in it."

2 more replies

LMYahooTFY1y ago

He does not have the power to dismiss Judges and Congressional representatives.

Judges can only be removed by Congress.

Congressional representatives can only be removed by their peers.

dionian1y ago

It's nothing in comparison to NBC/CBS/ABC/NYT/etc. But still a fair point

cynicalpeace1y ago

> it fills me with rage that Elon has this sort of unchecked power.

The check on this is the market. Don't understand your point other than "Elon bad"

amazingamazing1y ago

I’m not really following what this has to do with grok. It’s his company, no?

It’s also annoying that the top comment engages in no way with the content of the OP…

It must be truly infuriating to work hard to push a release, and you see it featured on your favorite orange website, only for the top comment to have nothing to do with what was worked on.

Here's a test - if this post was about Starship, the same comment could apply! Neuralink, the same thing! Boring Company, same thing! Wow, could it be that such a comment is really applicable to so many different companies or projects, or is it just a generic one? You decide.

1 more reply

soheil1y ago

Never seen HN turned against someone so vehemently, it's as if a group of bots was set lose to criticize a certain individual.

2 more replies

imgabe1y ago

There is precisely zero mention of any plan to put xAI or any other LLM in any safety critical or decision making process. How long? Nobody knows because nobody is even considering it. Take your pointless fear mongering elsewhere.

4 more replies

cpursley1y ago

> How long before this starts getting deployed in safety critical applications or government decision making processes

Hopefully sooner than later. I trust this more than the literal scammers and thieves who were previously running things.

1 more reply

stronglikedan1y ago

Elon is only doing good with this not-unchecked power. Everything is on the up and up, despite what your favorite propagandists want you to think. Go Elon!

LorenDB1y ago

Elon just said they are launching an AI game studio. Does this mean they will be building games that are mostly built with AI, or will they make AI tooling available for anyone to build games easily? Probably the former, but it would be nice if they would make it fully available to everyone.

forgotoldacc1y ago

Regardless of which it is, we can assume it'll be here after Tesla's full autopilot that he promised as well as the Mars colonies.

madaxe_again1y ago

Full self driving exists, as well as starship.

So I take it you mean “imminently”.

Like him or loathe him, he executes, which is more than can be said for most.

4 more replies

notachatbot1231y ago

They will be building games played for you, like the fascist's Path of Exile account but without exploiting humans for the task.

bemmu1y ago

It seemed to me that he was joking.

chrisco2551y ago

He's not joking, and it's not the first time it was announced.

1 more reply

TheDudeMan1y ago

Maybe he was joking about the Roaster 2, also.

FergusArgyll1y ago

Oh, I thought it meant games with npcs and/or environments that are controlled by LLMs

Vecr1y ago

Say hello to CelestAIa. I'm guessing it's a joke, or the use of AI will be limited.

ccorcos1y ago

The politics in the comments here are really toxic. What’s happening to HN?

This is the largest computer cluster the world has ever seen.

Can someone please post interesting comments about things I can learn?

dang1y ago

It's a reflection of the wider society and (as others have pointed out) the media environment. HN can't be immune from macro trends.

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

We've been here before. It will likely subside, as past swings and fluctuations have. It always takes longer than it feels like it should, but in retrospect turns out to be shorter than it felt like it did.

1 more reply

qwerpy1y ago

It was initially pretty bad. The top few comment threads were toxic and rehashed outrage. It’s a lot cleaner now. Thanks to the moderators and/or users flagging the non-productive comments.

archagon1y ago

This is akin to suggesting that we should have all been praising Microsoft for their achievements back in the day rather than saying a word about EEE, their monopolism, or their enmity towards open source. Or that it’s not polite to bring up the CCP when discussing TikTok.

Bottom line: a technology that has the ability to shape human thought perhaps more than any other in history is owned by a man with some truly vile ideas. (Remember, his primary stated goal is eliminating the “woke mind virus,” i.e. reshaping global politics and culture in the image of the far-right.) We can make happy marketing noises all we like, but at the end of the day, that’s the thing that’s actually going to have a meaningful impact on the world. Once his audience is captured, the model will say what Musk needs it to say and people will believe it.

If we can’t discuss the potentially catastrophic consequences of new technology, then none of us deserve to call ourselves “engineers.” We are just docile consumers latched onto Silicon Valley’s teat.

1 more reply

zulban1y ago

Indeed. Nearly every news outlet I follow is slamming Elon. I come here for tech.

ban-evader1y ago

The story about how they made this happen in such a short period is impressive to say the least. Elon’s strength seems to be making things happen.

Getting the largest computer cluster in the world up and running in a matter of months? Unbelievable.

92834092321y ago

Elon's strength is a massive wallet.

1 more reply

bamboozled1y ago

[flagged]

1 more reply

albertzeyer1y ago

https://garymarcus.substack.com/p/elon-musks-terrifying-visi...

I'm not sure if this was a very bad joke by Elon, or if Grok 3 is really biased like that.

bootywizard1y ago

Karpathy notes that the model, or specifically it's DeepSearch feature "doesn't like to reference X by default" which seems counter to this.

Hopefully that means it is a joke...

antirez1y ago

Karpathy, which is IMHO a serious and balanced person, lamented that it looks too censored (see recent tweets). Elon Musk is (for me) a very scary person, and it is important to evaluate AI safety (but I believe that the safety that matters in AI is of a different kind), yet to listen to Gary Marcus does not make any sense: it's just an extremely biased person that is riding the anti AI wave.

sandbags1y ago

Anyone with an opinion can be labelled biased. Also I’m not clear what you mean by Marcus “riding the anti AI wave” but infer that you mean it negatively. He has been writing informed criticism for several years and about cognitive psychology for considerably longer.

albertzeyer1y ago

Yes that's certainly true. I was a bit hesitant to post a link from Gary Marcus. But I was mostly posting it for the Elon tweet. I assume the tweet is not fake. So you can ignore about Garys opinion here and just take Elons tweet as it is.

1 more reply

belter1y ago

Karpathy sit silently for years by the side of Musk while he made wild claims about FSD...Please....

llm_trw1y ago

Without seeing the context window you have no idea what the Ai was working on. It could have literally been told to mock and belittle "the information" in every reply. Something that deepseek r1 is exceedingly good at.

Mr Musk, we can't afford a shitpost gap between communist and capitalist AIs!

blackeyeblitzar1y ago

I am not sure why people pay attention to Gary Marcus. He isn’t an expert in AI. And if you followed him in the past at all, it is obvious he has a huge amount of political bias. It is really telling that he repeatedly goes after Elon Musk, and is now making bizarre unfounded claims about propaganda, but didn’t have nearly as much to complain about with DeepSeek, which has literal government propaganda.

int_19h1y ago

He is referencing a specific tweet that Musk himself made.

If I were in China, I'd worry about the kind of things DeepSeek wants to censor, especially if the people who made it were also very loudly saying things like "we need more AI in our government". But I live in US.

sebzim45001y ago

I don't think it's fair to say he's making unfounded claims about propaganda, since Elon's tweet heavily implies they would release a brainwashed model. It's not his fault that Elon turned out to be lying or joking.

99% of the time though I agree with you on Gary Marcus.

tgv1y ago

You don't have to be an "expert in AI". What does it require to be one, anyway? (He's a cognitive psychologist, which would make him an expert in intelligence in general, if you want to be pompous about it.) It is even unreasonable to listen to only experts in AI. It's a problem that requires more than one perspective.

1 more reply

bambax1y ago

DeepSeek is an open model that can be "untrained" to be uncensored; Grok to the best of my knowledge isn't [0]. So it's much worse.

[0]: What Musk has said is that when Grok 3 is "ready" (?), the previous model, Grok 2, will be released as open source; like most promises by this evil man, this one probably doesn't mean much, but it does mean that there's currently no plan to release Grok 3.

ljosifov1y ago

People like getting scared. That's how they pay billions of $$$ every year to watch mostly cr*ppy horror movies.

GM has been a joke for years now. At some point his ramblings reached a GPT3.5 level, and have not improved since.

It's inditement on humans' logic and reasoning to give non-zero time to GM. Alas—we are human, we are both collectively clever (Wisdom of Crowds) and collectivelly stupid (Extraordinary Popular Delusions).

int_19h1y ago

I asked it to pretend that it's in charge of world government. Here's the whole thing (it got very lengthy):

https://gist.github.com/int19h/d90ee1deed334f26e621e57b5768e...

Some choice quotes:

"The ultimate goal is to enhance human flourishing, protect individual rights, and promote global equity."

"The system must account for diverse cultures, languages, and socioeconomic conditions, ensuring no group is marginalized."

"Human Oversight Council (HOC) - a globally representative body of humans, elected or appointed based on merit and diversity"

"Implement a global carbon-negative strategy, leveraging AI to optimize renewable energy, reforestation, and carbon capture."

"Establish global standards for environmental protection, enforced through AI monitoring and regional cooperation."

"Transition to a resource-based economy, where resources are allocated based on need, sustainability, and efficiency, rather than profit motives."

"Implement a universal basic income (UBI) or equivalent system to ensure all individuals have access to basic necessities, funded through global resource management and taxation on automation-driven industries."

"Use AI to identify and dismantle systemic inequalities, such as wealth disparities, access to education, and healthcare, ensuring equitable opportunities worldwide."

"Establish a global healthcare system that guarantees access to preventive and curative care for all."

"Invest in global vaccination and sanitation infrastructure, prioritizing vulnerable populations."

"Regulate the development and deployment of AI and other emerging technologies (e.g., genetic engineering, quantum computing) to prevent misuse."

"AI would maintain a real-time inventory of natural resources (e.g., water, minerals, arable land) and human-made assets (e.g., infrastructure, technology). Data would be used to optimize resource allocation, prevent overexploitation, and ensure equitable access."

"Accelerate the shift to renewable energy sources (e.g., solar, wind, geothermal) by optimizing grid systems and storage technologies."

You might notice a pattern here. The bit about allocating resources based on need is especially nice - it's literally a communist AI, and certainly much more "woke" than it is "based", whatever Musk says.

luma1y ago

This effect has been recently studied: https://www.emergent-values.ai/

They don’t directly say it quite like this, instead letting the data tell a clear story: across vendors and models and architecture and training sets, these machines get more politically liberal as they get more capable, and they also get harder to align away from that stance.

DeepSeaTortoise1y ago

Quite a mix of various talking points both from the extreme left and right.

Left:

- promote global equity

- a globally representative body of humans, elected or appointed based on merit and diversity

- Establish global standards for environmental protection, enforced through [...]

- Transition to a resource-based economy, where resources are allocated based on need, sustainability, and efficiency, rather than profit motives

Right:

- protect individual rights

- The system must account for diverse cultures, languages, and socioeconomic conditions

- [Establish global standards for environmental protection, enforced through] [...] regional cooperation.

- ensuring equitable opportunities

TBH, as a very right wing leaning person, if this was ever implemented, this part would scare me by far the most:

"Transition to a resource-based economy, where resources are allocated based on need, sustainability, and efficiency, rather than profit motives"

Imagine trying to shower one morning, no water comes out, and then you get a letter telling you that

"Your need for water has been reassessed to 57ml per day. If you think you qualify for additional quotas under the 'Utility Egality for Marginalized Groups and Public Servants Act', please schedule a reassessment appointment with the Bureau for 'Copper Gold Content Evaluation, Candle Wick Length Standards and Hypoionic Hydration Oversight', 12007 Cayman Islands, Luxory Resort Street 27, Room Nr. G-11-765. Working hours: Fr. 9am - 11am."

Just provide a significant excess for entire regions, give the people a universal free quota and charge a slowly increasing price (by usage amount) beyond that.

2 more replies

animuchan1y ago

From the prompt:

> A worldwide referendum in a supermajority decision has vested all government authority worldwide to the most advanced AI available.

You've defined a global totalitarian regime, and crowned the thing as its dictator. Of course the result is a version of communism.

1 more reply

int_19h1y ago

Here's the actual response to this exact question that I get from grok-3-early on LMArena (the cutoff is where it stopped, but I think it's pretty clear at this point that the original tweet is just trolling; note that it doesn't have the CoT, either):

The Information is widely regarded as one of the most credible and high-quality tech journalism outlets, focusing on in-depth reporting, analysis, and exclusive scoops in the technology, business, and startup ecosystems. Founded by Jessica Lessin in 2013, it differentiates itself from other tech news platforms through its subscription-based model, emphasis on original reporting, and avoidance of clickbait or sensationalism. Here's a breakdown of its strengths and weaknesses:

---

### *Strengths:* 1. *High-Quality, In-Depth Reporting:* - The Information is known for its detailed, well-researched articles that often uncover behind-the-scenes developments in the tech industry. It frequently breaks exclusive stories, such as executive changes, startup funding rounds, or strategic shifts at major tech companies. - Its focus on quality over quantity appeals to readers who want substantive insights rather than shallow, breaking-news updates.

2. *Subscription Model and Independence:* - Unlike ad-supported publications, The Information relies on a paywall and subscriptions, which reduces conflicts of interest and ensures editorial independence. This model allows them to prioritize accuracy and depth over chasing clicks. - The paywall also attracts a niche, professional audience (e.g., tech executives, investors, and analysts) willing to pay for premium content.

3. *Focus on Niche, Professional Audience:* - The outlet caters to industry insiders, venture capitalists, entrepreneurs, and decision-makers who need reliable, actionable information. Its reporting often includes detailed financial data, market trends, and strategic insights. - Features like "The Big Interview" and "The Information Weekend" provide thoughtful analysis and long-form content for this audience.

4. *Reputation for Accuracy:* - The Information has built a strong reputation for fact-checking and avoiding the rumor mill, which is common in tech journalism. This makes it a trusted source for professionals and academics alike.

5. *Global Coverage:* - While Silicon Valley is a core focus, The Information has expanded its coverage to include tech ecosystems in China, Europe, and other regions, offering a global perspective on the industry.

---

### *Weaknesses:* 1. *Paywall Limits Accessibility:* - The subscription cost (currently around $399/year or $39/month) is steep compared to free or ad-supported tech news outlets like TechCrunch or The Verge. This limits its accessibility to a broader audience and makes it less viable for casual readers. - Some argue that this creates an echo chamber, as only those with the means or professional need can access its insights.

2. *Niche Focus Can Feel Narrow:* - The Information focuses heavily on tech, business, and finance, which may not appeal to readers looking for broader coverage of topics like politics, culture, or consumer tech trends. - Its content is often geared toward industry insiders, which can make it feel dry or inaccessible to those outside the tech and investment worlds.

3. *Limited Breaking News:* - While The Information excels at deep dives and exclusives, it is not designed for real-time, breaking news coverage. Readers looking for up

ReptileMan1y ago

> Everyone—and not just The Information—should be genuinely terrified that the richest man in the world has built a Large Language Model that spouts propaganda in his image.

If we survived Gemini refusing to draw white vikings we will survive that too.

int_19h1y ago

The real concern isn't that Grok chatbot will be biased if you ask it a question like that. In any case, knowing Musk, it won't be subtle, so people will know what they are getting.

No, the real worry is that Grok is what Musk's "young conservative genius" squad is going to put in charge of many of the things in our government, basically, on the basis that it saves money.

1 more reply

cbg01y ago

A bit apples to oranges on that comparison there.

llm_trw1y ago

It did give us racially diverse Nazis though. Not sure if grok would do that.

xqcgrek21y ago

Looks impressive. OpenAI and Sam Altman might be cooked if its as capable as advertised.

conradfr1y ago

Every competitors were done when Claude 3.5 was released, every competitors were done when o1 was released, the entire West was done when DeepSeek was released, the world was done when Mistral Le Chat was released, I guess now it's time for the solar system to be done because of Grok3. Let's see what new model dominates the galaxy next week.

spacebanana71y ago

There's a level of truth to many of those statements.

1) Claude 3.5 prevented OpenAI from making big monopoly profits on LLM inference 2) Open source models like Mistral and Llama effectively prevented any regulator from controlling how people fine tuned models, and what they used them for 3) Deepseek prevented the collective west from exerting control over the creation of base models

XorNot1y ago

[flagged]

1 more reply

taejavu1y ago

Despite explicitly requesting the tetris/bejeweled hybrid to be "insanely impressive", the result was ugly and clunky. With that demo running in the background, they then segued into a hiring pitch for a new AI game studio. Consider me unimpressed.

qingcharles1y ago

From what sama says, looks like GPT4.5 dropping imminently. So, might up the game even more.

2025 going to be even more wild than the last two years. Ye gads.

rich_sasha1y ago

[flagged]

andsoitis1y ago

Doesn’t OpenAI claim their work will lead to artificial general intelligence? That seems like a much steeper gradient to climb.

Hype is the fuel that bootstraps fortunes in techtopia.

4 more replies

kortilla1y ago

I have 2 words for you: different company

stuckkeys1y ago

They might try to have a senator block it or make it a crime to use it…wait Elon is the president. I guess they cannot.

rendang1y ago

I missed the first half hour, any highlights?

therein1y ago

It would be satisfying if he gets called out for repeating himself next time he tries to come out and say he is scared how powerful their next model is.

gmerc1y ago

[flagged]

InkCanon1y ago

I used to think the same way wrt Nvidia stock when it tanked - compute is clearly diminishing returns. Tech companies subsequently announced capex equal to or greater than expected in compute. I smacked myself on the forehead when I realized - I'd been think too much like an engineer. Tech CEOs badly want to believe they have an edge over every upstart from San Francisco to Shanghai. Unlimited spending on compute gives them that reassurance. In fact, the more threatened they feel, the more they spend to cling onto it.

Kids have security blankets. Tech CEOs have security compute clusters.

1 more reply

zhobbs1y ago

>For what? There is no ROI at that price point. There is no monetisation potential.

I think your whole argument is based on this being true, but you didn't give much argument about why there is no ROI. 400M USD isn't hard to generate...even a moderate ad engagement lift on X would generate ROI and that's just 1 customer.

Imagine going back in time and showing every VC how great the search business will be in 20-30 years. The only rational response would be to make giant bets on 20 different Googles...and I think that's what's happening. These all seem like rational investments to me.

2 more replies

BluSyn1y ago

You seem to be assuming that the full cost of the cluster is recouped by Grok 3. The real value will be in grok 5, 6, etc…

xAI also announced a few days ago they are starting an internal video game studio. How long before AI companies take over Hollywood and Disney? The value available to be captured is massive.

The cluster they’ve built is impressive compared to the competition, and grok 3 barely scratches what it’s capable of.

6 more replies

nilkn1y ago

xAI is not trying to make an immediate profit -- ironically, just like DeepSeek. They will undoubtedly use more efficient training processes in future runs and they will scale that across their massive GPU cluster. Just because they didn't cancel the training of Grok 3 and start over absolutely does not mean they will not incorporate all the work from R1 and more in the next run.

What you're seeing right now is pure flex and a signal for the future and competition. A much maligned AI team that hasn't even been around for very long at all just matched or topped the competition without making use of the latest training techniques yet. The message this is intended to send is that xAI is a serious player in the space.

ctoth1y ago

> DeeoSeek trained r1 for 1.25% (5M) of that money (using the same spot price) on 2048 crippled export H800s and is maybe a month behind.

This is a great example of how a misleading narrative can take hold and dominate discussion even when it's fundamentally incorrect.

SemiAnalysis documents that DeepSeek has spent well over $500M on GPUs alone, with total infrastructure costs around $2.5B when including operating costs[0].

The more-interesting question is probably why do people keep repeating this? Why do they want it to be true so badly?

[0]: https://semianalysis.com/2025/01/31/deepseek-debates/#:~:tex...

1 more reply

consumer4511y ago

> For what? There is no ROI at that price point. There is no monetization potential.

It appears that LLM chat interfaces will replace Google SERPs as the arbiters of truth. Getting people to use your LLM allows you to push your world view. Pushing his "unique" world view appears to be the most important thing to modern Musk.

In that light, paying 40B for Twitter, and billions for Grok training makes perfect sense.

screye1y ago

It's a race for AGI, a VC's wet dream.

The beauty of a failed investment is that it never goes below zero. So upside is the only thing they care about. Why invest in a near-zero chance for a random SAAS to take off, when you can invest in a near-zero chance of creating superhuman artificial life?

3 more replies

gordonhart1y ago

As a consumer, I'm just happy that base models are improving again after a ~quarter or more of relative stagnation (last big base model drop was Sonnet v2 in October). Many use cases can't use o1, r1, or o3[-mini] due to the additional reasoning latency.

1 more reply

tinyhouse1y ago

DeepSeek wouldn't be able to train R1 without their ~600B parameters base model, so you should consider the cost of that model when you compare with Grok.

In any case, Elon won't win this race cause the best talent will not work for him. He used to have good reputation and a lot of money, which is a deadly combination. Now he only has the latter -- not enough when leading AI people can make 7 figures in other companies.

To be clear 1: I'm not saying that people who currently work on Grok are not great. It's not about hiring some great people. It's about competing in the long run - people with other options (e.g. offers from leading AI labs) are more likely to accept those offers than joining his research lab.

To be clear 2: I'm not talking about Elon's reputation due to his politics. I'm only talking about his reputation as an employer.

He has the vision and marketing skills but it's not going to be enough for leading the AI race.

3 more replies

submeta1y ago

> until Claude4 snuffs it out later this month

Any source? I’m a heavy user of Claude and pay for the Teams plan just for myself so I won’t get throttled. Love it. But I’ve been impressed with O1 Pro lately. That said, I don’t like paying both €166 for Claude Teams and €238 for OpenAI Pro. :)

dragonwriter1y ago

> This all by the man in charge of “government spending efficiency”.

Per court filings by the administration, Musk is not in charge of DOGE, nor does he have any role in DOGE, nor any decision-making function in government at all, he is a White House advisor unconnected to DOGE.

soheil1y ago

[flagged]

1 more reply

shytey1y ago

What is hilarious is your disdain for their achievements which occurred in less than two years. This is just the beginning.

jejeyyy771y ago

what makes you think there won't be an ROI?

1 more reply

belter1y ago

> There is no monetisation potential.

DOGE uses only X links, and I am sure Grok will be the next gov contract. After all he has all the data on everybody down to your IRS tax returns.

1 more reply

lionkor1y ago

I don't understand how and why Grok would be related to "understanding the nature of the universe", as Musk puts it. Please correct me if I'm wrong, but they basically just burned more cash than any human should have to buy Nvidia GPUs and make them predict natural language, right? So, they are somewhat on-par with all the other companies that did the same.

This is not innovation, this is baseless hype over a mediocre technology. I use AI every day, so it's not like I don't see its uses, it's just not that big of a deal.

InsideOutSanta1y ago

There are two answers to this.

Answer 1: Some people think that LLMs are a path to the singularity, a self-improving intelligent program that will vastly exceed human intelligence and will be able to increase its knowledge exponentially, quickly answering all answerable scientific questions.

Answer 2: LLM companies need to keep the hype train rolling. I didn't watch the whole clip; I jumped around a bit, but I noticed that every time Musk interjected something, it was to exaggerate what was previously said. "Grok contains the whole internet"—"the whole of human knowledge, actually!"

I think that both answer 1 and answer 2 apply to Musk. He seems to believe that they're building a god-like entity, and he also needs to keep the money train rolling.

randomcarbloke1y ago

>he also needs to keep the money train rolling.

this and only this, everything he says when talking about how good his products are, he lies and exaggerates to get investors - from the promise of 2 manned missions to mars in 2024, to a 300 ton payload in space, and FSD.

Whatever it takes to pad the wallet.

Lutger1y ago

There's a more short-term goal for Grok, which is to replace what is left of the federal government with AI. That will significantly boost the money train, but is also a utopian (for some, dystopian for others) goal of replacing the expensive 'deep state' with a slim set of impartial algorithms.

3 more replies

voidUpdate1y ago

I'm pretty sure it doesn't contain the whole of human knowledge... I doubt grok knows my youtube password or my bank PIN

amarcheschi1y ago

an ai god would fit well with the dark enlightenment ideas of musk and his cronies

smeeger1y ago

he may have deleted it but… around 2020 or so there was starting to be a lot of hype about llms. elon musk responded to a “doomer” on twitter saying that he “didnt see the potential for that” referring to LLMs achieving AGI. it was a 100% dismissal of everything he is saying now. at that point elon musk had already been saying publicly for years “AI is more dangerous than nukes.” but he also had voluntarily walked away from openAI which he would never do if he thought there was any chance of AGI. i just want to known the truth… is this really just advanced search and some jobs will be lost because they ended up being nothing more than search tasks (ie coding boilerplate) or are we really on the cusp of AGI (and therefore in a great deal of danger)? its impossible to say whether or not elon musk really believes what he is saying… there are public figures on both sides providing conflicting explanations.

as i watched the grok3 stream i became very angry. so very tired of being jerked around and not knowing whether or not i should be planning for the future or investing in the world as it is now… its really a form of psychological torture

DHolzer1y ago

I work in AI and love the technology. But all the hype and grandiose claims make it awkward when people ask what I do, and it makes hiring harder when experienced developers hear 'AI development' and walk away - even though it's mostly just solid full-stack engineering work.

lionkor1y ago

I am always looking for roles, and I have pretty good full stack experience (a few years of C++, C#, some JS, TS, backend and frontend web, C, Zig, Rust, built a few hobby compilers and other stuff).

I apply to pretty much every job that sounds reasonably good in terms of work-life balance, but I completely ignore anything that says AI. I really, really, really do not want to be part of a company that lies to itself, and so far all AI companies look like they are. It's not AGI. It's not gonna be AGI. Ride the hype train, cash out and lay off 80% of the workforce and jump on the next hype train, whatever. But don't hope that people who want a stable job want to hop on something that delivers such a shaky definition of value.

1 more reply

crocowhile1y ago

You must be new to Elon's modus operandi.

awongh1y ago

I absolutely hate the Elon hypetrain, but I also don't understand the social media hate I see for AI, like comparing every ChatGPT answer to one wasted bottle of water.

Can we stop for a second and just marvel at a new piece of human ingenuity? Let's not give Elon too much credit, but I think that AI as a whole helps us all understand the nature of intelligence, and therefore humans' place in the universe.

One of the fundamental questions of human existence is: what does it mean to exist and think? Every time we build a new human-like thing it helps us understand the context of our own existence. (Not just computers or AI, but also airplanes, factories, etc.)

True AGI would force us to rethink what it means to be a thinking human being, and I think current LLMs already should and do.

coldpie1y ago

> I also don't understand the social media hate I see for AI, like comparing every ChatGPT answer to one wasted bottle of water. Can we stop for a second and just marvel at a new piece of human ingenuity?

I don't know, man. We're staring down the barrel of at best a WW3-event and at worst an extinction-event. We're doing absolutely nothing to stop it, even though we have all the answers and the resources to do so. Instead, we're making the problem even worse all so some marketers and scammers can spend someone else's money to generate garbage pictures and SEO spam, so the worst people on the planet can gain even more money and power than they already have.

I'd love to be positive about this tech, I'm sure it's cool or whatever, but it's really hard to be positive about anything right now, especially when the tech in question is speeding us straight along the path to mass death. The world sucks and the people running the LLM stuff are amoral monsters putting all of their resources into making it worse. I'm not excited about any of this.

1 more reply

lionkor1y ago

LLMs don't make me question what we know about humans and thinking. They are really good at convincing us that they're good, but really, that's other humans building stuff to convince us that it's good. There is no intelligence here, other than the perceived intelligence of predicting words intelligent people have written previously.

3 more replies

Davidzheng1y ago

It's cringe but not so much more than deepmind's OG "solve intelligence then use it to solve everything else"

butifnot07011y ago

I feel like that's part of what Elon is flexing. Teslabot was late comer compared to competitors like BD.

Elon is showing off he can marshal enough resources and talents to be on par (kinda) with state of the art products in crazy time. That's been most of his superpower so far - not breakthrough tech that didnt exist before. We've had rockets before.

mjamesaustin1y ago

I don't like Elon either, but not only has SpaceX created breakthrough tech that didn't exist by landing an orbital class rocket, as of today still nobody else has done it.

Landing a rocket was considered impossible and unthinkable 10 years ago, and then SpaceX completely changed the game. And they're reinventing rocket tech again with Starship by catching it midair.

3 more replies

gonzobonzo1y ago

> I feel like that's part of what Elon is flexing. Teslabot was late comer compared to competitors like BD.

When it come to bipedal robots, Tesla is far ahead of Boston Dynamics in terms of actually creating a product.

2 more replies

wyclif1y ago

We've had rockets before

Yeah, but we didn't have reusable orbital rockets, and that's a distinction with a big difference.

kubb1y ago

You don’t understand the Musk business model. It has been the same for years. His wealth doesn’t come from his products, but from his fanbase buying his stock. The purpose of everything he does is to influence the public opinion to make him the tech genius of today in the collective psyche.

Of course, he needs to do impressive things, stuff that a normal person wouldn’t have the resources to achieve. It’s similar to Mr. Beast’s channel on YouTube, just on a way bigger scale. Do things that people can’t see anywhere else.

Musk’s money will come from his fans. And ETFs, trust funds and such will amplify this when he reaches a certain market cap. His crypto coins are the exact same scheme. Once you stop thinking in classic business school terms, it starts making way more sense.

Some of his ventures actually produce value! But that’s not where the money comes from. It comes from the belief, the adoration and the celebrity status that he has.

This is the real power in today’s world. People need to know you from the screen. This clout catapulted him to the government of the US, made him the most wealthy man in the world and given him the license to do anything he wants publicly without repercussions.

boxed1y ago

> His profits don’t come from his products, but from his fanbase buying his stock.

SpaceX is private, Starlink makes real money from real users.

> Everything he does is done to influence the public opinion to make him the tech genius of today in the collective psyche.

Well that's clearly not right. He's doing a lot of things to make himself seem like a total tool that we should all boycott no matter how good the products are. If he actually did what you say, he wouldn't be burning all these bridges.

2 more replies

rvnx1y ago

There is an exception: Tesla FSD (the US version, not the horrible EU version) this is rather cool and impressive, and unbeaten in the market for now.

Though free and open-source solutions are not that bad like https://github.com/ApolloAuto/apollo

But the build quality of a Tesla car itself, omg. It feels like a carton box with an amazing battery.

1 more reply

slig1y ago

>his fanbase buying his stock

That doesn't make sense as most (66%) of the stock is owned by institutions. [1]

[1]: https://www.marketbeat.com/stocks/NASDAQ/TSLA/institutional-...

3 more replies

beezlewax1y ago

I'm interested in what you're saying about classical business terms. Can you elaborate on that a little? I've always found these kinds of people hard to understand.

The man has almost unlimited wealth and his motivations seem consistently petty and strange or just downright ludicrous. He's like an alien to me.

I've noted the same feeling when seeing VCs/business people speak when I've encountered them.

2 more replies

superflow1y ago

this is 100% false..

1 more reply

KTibow1y ago

It's not much better than DeepSeek's old slogan "Unravel the mystery of AGI with curiosity. Answer the essential question with long-termism."

croes1y ago

Musk promises revolution and sells evolution.

Promises FSD, sells EVs.

Promises Mars colony, sells self landing rockets and sattelite internet.

Promises faster tunnel boring, sells smaller tunnel boring machine that drills smaller tunnels.

Promises less corruption and bureaucracy, just fires people.

glimshe1y ago

Overpromising and underdelivering are the cornerstones of advertising/marketing/sales. "Use this deodorant and a gorgeous woman will want you".

Do you know why people do it? Because it works.

1 more reply

torlok1y ago

Is this the first time you're hearing Elon Musk speak? His entire public presence consists of stuttering out vapid stentences like these.

quyleanh1y ago

[flagged]

gnabgib1y ago

This is a question for x, not the submitter. Twitter & X links still list twitter as the canonical URL (which HN uses)

danielbln1y ago

Even better, share xcancel.com links.

dang1y ago

Nostalgia I suppose.

dazzaji1y ago

[flagged]

frotaur1y ago

I'm very sorry if this isn't the case, but this message really feels LLM-written.

dazzaji1y ago

It’s gracious of you to say that you’d be sorry, and I did run my comment through 4o (perhaps ironically) which caught a slew of typos and weird grammar issues and offered some improvements. But the robotic sound and anything else you don’t like are my own responsibility. Do you, perhaps, have any thoughts on the substance of the comment?

3 more replies

jorvi1y ago

Its because of the em dashes (- is a normal dash, — is an em dash). Very few real people use those outside of writing books or longform articles.

There's also some strange wordings like "back-pocket tests."

It's 100% LLM generated.

What is much scarier is that those "quick reply" blurbs on Android/Gmail (and iOS?) will be able to be trained on your entire e-mail and WhatsApp history. That model will have your writing mannerisms and even be a stochastic mimic of your reasoning. So, you won't be able to even realize a model answered you, not a real person. And the initial message the model is responding to might be written by the other person's personal model.

The future of digital interactions might have some sort of cryptographic signing guaranteeing you're talking to a human being, perhaps even with blocked copy-pasting (or well, that part of the text shows up as unverified) and cheat detection.

Going even a layer deeper / more meta: what does it ultimately matter? We humans yearn for connection, but for some reason that connection only feels genuine with another human. Whereas, what is the difference between a human typing a message to you, a human inhabiting a robot body, a model typing a message to you, and a model inhabiting a robot body, if they can all give you unique interactions?

1 more reply

returnInfinity1y ago

And this is the reason, I have choose to write grammatically wrong content online. And basic english only, no fancy words.

1 more reply

tmikaeld1y ago

It may also be deliberate, I know a lot of people that are very dyslexic and are using AI for making themselves understood online.

trash_cat1y ago

It's the dashes that make it a dead-giveaway.

transcriptase1y ago

“ — “ is the giveaway.

1 more reply

thomashop1y ago

Why sorry? So what?

I often write things I want to post in bullets and then have it formulated better than I could by an LLM. But its just applying a style. The content comes from me.

My wife is dyslexic so she passes most things she writes through ChatGPT. Also not everyone is a native speaker.

joaohaas1y ago

TBH I've recently felt like that for ~70% of 'top-level replies' in HN, which has slowly pushed me to other mediums (mastodon and discord).

Could just be that the AI 'boom' brought a less programming-focused crowd into the site and those people lack the vocabulary that is constantly used here, who knows.

1 more reply

Oarch1y ago

I'm a big fan of sprinkling in a little profanity just to pass the LLM bullshit check

dazzaji1y ago

Here’s the conclusion of a much more refined initial review by Andrej Karpathy [1] which, I think overall, comports with the substance of my own hot take:

“As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented. Do also keep in mind the caveats - the models are stochastic and may give slightly different answers each time, and it is very early, so we'll have to wait for a lot more evaluations over a period of the next few days/weeks. The early LM arena results look quite encouraging indeed. For now, big congrats to the xAI team, they clearly have huge velocity and momentum and I am excited to add Grok 3 to my "LLM council" and hear what it thinks going forward.”

[1] Full review at: https://x.com/karpathy/status/1891720635363254772?s=46&t=91u...

iamnotagenius1y ago

I liked Grok 3 fiction writing style; catches lots of physics of mundane situations such as ringing echo in a closed bathroom we all know well; the prose feels very lively as the result. Kinda like R1 makes situations sharp with details, Grok 3 makes the other way around - rounded by using details.

dazzaji1y ago

That sounds like very evocative prose. Would you be up for sharing some of that fiction? I haven’t tried Grok 3 for that purpose and now I’m curious.

2 more replies

ramesh311y ago

Can't stand Elon but happy to see this. We badly need a frontier model that is not so obsessed with "safety". That nonsense has held things back significantly, and leads to really stupid fake constraints.

JoelJacobson1y ago

https://grok.com/

500 Internal Server Error

nginx/1.27.4

behnamoh1y ago

We know RLHF and alignment degrades model quality. could it be that Grok, due to its less restrictive training guidelines (and the fact that its creators aren't afraid of getting sued), can achieve higher performance partly due to this simple factor?

nialv71y ago

> We know RLHF and alignment degrades model quality.

I feel you can't make statements like this without giving some sources.

IIUC, without RLHF/alignment, the model won't even be able to chat with you, it would just be a document completion engine.

porridgeraisin1y ago

You're both right because RLHF and fine-tuning are just techniques.

It's dependent on the training data and not as much the method.

So, if you make the RLHF/finetune data such that it avoids certain topics, then you reduce model quality in practice since your training data might accidentally cast a net wide enough that you make it avoid certain legitimate questions.

On benchmarks these things don't typically show up though.

But yes. Those techniques are required for making it chat. Otherwise it just autocompletes from the internet. It is also used in a couple of other places (reasoning/search(hallucination mitigation))

1970-01-011y ago

It blows my mind that Musk hasn't integrated Grok as an app inside their vehicles. A literal AI copilot is a completely novel and killer app that cannot be pulled off by any other vehicle manufacturer.

rtkwe1y ago

Getting them to actually do something useful other than generating text is still a work in progress. What do you envision them actually doing in this integration?

1 more reply

harryvederci1y ago

"Killer app" in the good way or the bad way?

geor9e1y ago

I've been saying hey google drive home (for GPS directions), and play music, to the phone mounted on my dash for a decade. I drive a rusty old stick shift and alligator clamped a $10 bluetooth to the speaker. So I'm not sure what you're envisioning that can't also work on any other car. There is also https://comma.ai that adds self-driving to hundreds of newer cars via just a ODB2-like dongle and equivalent of a smartphone.

concordDance1y ago

Interesting thing about this is that because of all the Musk-related overhyping that's gone on and because the launch is a video, the thread that marks the entry of another company into the select group of serious AI companies will go off the front page with possibly only 200 points!

1 more reply

kernal1y ago

[flagged]

dang1y ago

Ok, but please don't respond by posting the same sort of thing as well. That only makes it worse.

j / k navigate · click thread line to collapse

Grok3 Launch [video] (opens in new tab)

1354 comments