undefined | Better HN

0 pointsalecco3mo ago0 comments

SemiAnalysis said it last week and AFAIK it wasn't denied.

https://newsletter.semianalysis.com/p/tpuv7-google-takes-a-s...

0 comments

The SemiAnalysis article that you linked to stated:

"OpenAI’s leading researchers have not completed a successful full-scale pre-training run that was broadly deployed for a new frontier model since GPT-4o in May 2024, highlighting the significant technical hurdle that Google’s TPU fleet has managed to overcome."

Given the overall quality of the article, that is an uncharacteristically convoluted sentence. At the risk of stating the obvious, "that was broadly deployed" (or not) is contingent on many factors, most of which are not of the GPU vs. TPU technical variety.

aleccoOP3mo ago

My reading in between the lines is OpenAI's "GPT-5" is really a GPT-4 generation model. And this is aligned with it being unimpressive. Not the promised leap forward Altman promised.

aswegs83mo ago

The only real change I noticed is it self censoring more than GPT-4.

1 more reply

nbardy3mo ago

This is misleading. They had 4.5 which was a new scaled up training run. It was a huge model and only served to pro users, but the biggest models are always used as teacher models for smaller models. Thats how you do distillation. It would be stupid to not use the biggest model you have in distillation and a waste since they have the weights.

The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

Sure they chose to not serve the large base models anymore for cost reasons.

But I’d guess Google is doing the same. Gemini 2.5 samples very fast and seems way to small to be their base pre train. The efficiency gains in pertaining scale with model scale so it makes sense to train the largest model possible. But then the models end up super sparse and oversized and make little sense to serve in inference without distillation.

In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

Big model => distill => RL

Makes the most theoretical sense for training now days for efficient spending.

So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

barrell3mo ago

My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models

1 more reply

binkHN3mo ago

This is a really great breakdown. With TPUs seemingly more efficient and costing less overall, how does this play for Nvidia? What's to stop them from entering the TPU race with their $5 trillion valuation?

matwood3mo ago

As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.

The bigger issue is that entering a 'race' implies a race to the bottom.

I've noted this before, but one of NVDA's biggest risks is that its primary customers are also technical, also make hardware, also have money, and clearly see NVDA's margin (70% gross!!, 50%+ profit) as something they want to eliminate. Google was first to get there (not a surprise), but Meta is also working on its own hardware along with Amazon.

This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.

sigmoid103mo ago

Making the hardware is actually the easy part. Everyone and their uncle who had some cash have tried by now: Microsoft, Meta, Tesla, Huawei, Amazon, Intel - the list goes on and on. But Nvidia is not a chip company. Huang himself said they are mostly a software company. And that is how they were able to build a gigantic moat. Because noone else has even come close on the software side. Google is the only one who has had some success on this side, because they also spent tons of money and time on software refinement by now, while all the other chips vanished into obscurity.

2 more replies

Glemkloksdjf3mo ago

Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.

Everything.

They can easily just do this for more optimized Chips.

"easily" in sense of that wouldn't require that much investment. Nvidia knows how to invest and has done this for a long time. Their Ominiverse or robots platform isaac are all epxensive. Nvidia has 10x more software engineers than AMD

1 more reply

dragonwriter3mo ago

> What's to stop them from entering the TPU race with their $5 trillion valuation?

Valuation isn’t available money; they'd have to raise more money in the current, probably tighter for them, investment environment to enter the TPU race, since the money they have already raised that that valuation is based on is already needed to provide runway for what they are already doing without putting money into the TPU race

captainbland3mo ago

Nvidia is already in the TPU race aren't they? This is exactly what the tensor cores on their current products are supposed to do, but they're just more heterogeneous GPU based architectures and exist with CUDA cores etc. on the same die. I think it should be within their capability to make a device which devotes an even higher ratio of transistors to tensor processing.

sysguest3mo ago

$5 trillion valuation doesn't mean it has $5 trillion cash in pocket -- so "it depends"

randomNumber73mo ago

If you look at the history how GPUs evolved:

1. there had be fixed function hardware for certain graphics stages

2. Programmable massively parallel hardware took over. Nvidia was at the forefront of this.

TPUs seem to me similar to fixed function hardware. For Nvidia it's a step backwards and even though they go into this direction recently I can't see them go all the way.

Otherwise you don't need cuda, but hardware guy's that write verilog or vhdl. They don't have that much of an edge there.

herbst3mo ago

Why dig for gold when you are the gold standard for the shovel already?

CamperBob23mo ago

That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.

seatac763mo ago

Semi analysis is great, they typically do semiconductors but reporting is top notch.

lanstin3mo ago

Wow, that was a good article. So much detail from financial to optical linking to build various data flow topologies. Makes me less aghast at the $10M salaries for the masters of these techniques.

Numerlor3mo ago

This article about them got published just yesterday... https://news.ycombinator.com/item?id=46124883

There's a lot of misleading information in what they publish, plagiarism, and I believe some information that wouldn't be possible to get without breaking NDAs

girvo3mo ago

> I believe some information that wouldn't be possible to get without breaking NDAs

…why would I care about this in the slightest?

ipnon3mo ago

Dylan Patel founded Semianalysis and he has a great interview with Satya Nadella on Dwarkesh Patel's podcast.

CSMastermind3mo ago

Semianalysis is great, def recommend following

rahimnathwani3mo ago

Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2

embedding-shape3mo ago

And this is relevant how? That interview is 1.5 hours, not something you just casually drop a link to and say "here, listen to this to even understand what point I was trying to make"

rahimnathwani3mo ago

Sorry, this was meant to be a reply to this comment: https://news.ycombinator.com/item?id=46127942

I was trying to make the point that SemiAnalysis is semi-famous.

1 more reply

kovezd3mo ago

You can now ask Gemini, about a video. Very useful!

2 more replies

j / k navigate · click thread line to collapse

0 comments

RossBencina3mo ago

The SemiAnalysis article that you linked to stated:

aleccoOP3mo ago

My reading in between the lines is OpenAI's "GPT-5" is really a GPT-4 generation model. And this is aligned with it being unimpressive. Not the promised leap forward Altman promised.

aswegs83mo ago

The only real change I noticed is it self censoring more than GPT-4.

1 more reply

nbardy3mo ago

The would have taken some time to calculate the efficiency gains of pretraining vs RL. Resumed the GPT-4.5 for whatever budget made sense and then spent the rest on RL.

Sure they chose to not serve the large base models anymore for cost reasons.

In RL the efficiency is very different because you have to inference sample the model to draw online samples. So small models start to make more sense to scale.

Big model => distill => RL

Makes the most theoretical sense for training now days for efficient spending.

So they already did train a big model 4.5. Not using it would have been absurd and they have a known recipe they could return scaling on if the returns were justified.

barrell3mo ago

My understanding of 4.5 was that it was released long, long after the initial training run finished. It also had an older cutoff date than the newer 4o models

1 more reply

binkHN3mo ago

matwood3mo ago

As others mentioned, 5T isn't money available to NVDA. It could leverage that to buy a TPU company in an all stock deal though.

The bigger issue is that entering a 'race' implies a race to the bottom.

This isn't a doom post for NVDA the company, but its stock price is riding a knifes edge. Any margin or growth contraction will not be a good day for their stock or the S&P.

sigmoid103mo ago

2 more replies

Glemkloksdjf3mo ago

Nvidia has everything they need to build the most advanced GPU Chip in the world and mass produce it.

Everything.

They can easily just do this for more optimized Chips.

1 more reply

dragonwriter3mo ago

> What's to stop them from entering the TPU race with their $5 trillion valuation?

captainbland3mo ago

sysguest3mo ago

$5 trillion valuation doesn't mean it has $5 trillion cash in pocket -- so "it depends"

randomNumber73mo ago

If you look at the history how GPUs evolved:

1. there had be fixed function hardware for certain graphics stages

2. Programmable massively parallel hardware took over. Nvidia was at the forefront of this.

TPUs seem to me similar to fixed function hardware. For Nvidia it's a step backwards and even though they go into this direction recently I can't see them go all the way.

Otherwise you don't need cuda, but hardware guy's that write verilog or vhdl. They don't have that much of an edge there.

herbst3mo ago

Why dig for gold when you are the gold standard for the shovel already?

CamperBob23mo ago

That is.... actually a seriously meaty article from a blog I've never heard of. Thanks for the pointer.

seatac763mo ago

Semi analysis is great, they typically do semiconductors but reporting is top notch.

lanstin3mo ago

Wow, that was a good article. So much detail from financial to optical linking to build various data flow topologies. Makes me less aghast at the $10M salaries for the masters of these techniques.

Numerlor3mo ago

This article about them got published just yesterday... https://news.ycombinator.com/item?id=46124883

There's a lot of misleading information in what they publish, plagiarism, and I believe some information that wouldn't be possible to get without breaking NDAs

girvo3mo ago

> I believe some information that wouldn't be possible to get without breaking NDAs

…why would I care about this in the slightest?

ipnon3mo ago

Dylan Patel founded Semianalysis and he has a great interview with Satya Nadella on Dwarkesh Patel's podcast.

CSMastermind3mo ago

Semianalysis is great, def recommend following

rahimnathwani3mo ago

Dylan Patel joined Dwarkesh recently to interview Satya Nadella: https://www.dwarkesh.com/p/satya-nadella-2

embedding-shape3mo ago

And this is relevant how? That interview is 1.5 hours, not something you just casually drop a link to and say "here, listen to this to even understand what point I was trying to make"

rahimnathwani3mo ago

Sorry, this was meant to be a reply to this comment: https://news.ycombinator.com/item?id=46127942

I was trying to make the point that SemiAnalysis is semi-famous.

1 more reply

kovezd3mo ago

You can now ask Gemini, about a video. Very useful!

2 more replies

j / k navigate · click thread line to collapse