undefined | Better HN

0 pointsblitzar8mo ago0 comments

Or shrink the margin down to just 50% and sell 10x the number of cards (for the week or two it would take Nvidia to announce a 5090 with 128gb)

0 comments

AnthonyMouse8mo ago

Nvidia uses VRAM amount for market segmentation. They can't make a 128GB consumer card without cannibalizing their enterprise sales.

Which means Intel or AMD making an affordable high-VRAM card is win-win. If Nvidia responds in kind, Nvidia loses a ton of revenue they'd otherwise have available to outspend their smaller competitors on R&D. If they don't, they keep more of those high-margin customers but now the ones who switch to consumer cards are switching to Intel or AMD, which both makes the company who offers it money and helps grow the ecosystem that isn't tied to CUDA.

People say things like "it would require higher pin counts" but that's boring. The increase in the amount people would be willing to pay for a card with more VRAM is unambiguously more than the increase in the manufacturing cost.

It's more plausible that there could actually be global supply constraints in the manufacture of GDDR, but if that's the case then just use ordinary DDR5 and a wider bus. That's what Apple does and it's fine, and it may even cost less in pins than you save because DDR is cheaper than GDDR.

It's not clear what they're thinking by not offering this.

blitzarOP8mo ago

> Intel or AMD making an affordable high-VRAM card is win-win.

100% agree. CUDA is a bit of a moat, but the earlier in the hype cycle viable alternatives appear, the more likely the non CUDA ecosystem becomes viable.

> It's not clear what they're thinking by not offering this.

They either dont like making money or have a fantasy that one day soon they will be able to sell pallets of $100,000 GPUs they made for $2.50 like Nvidia can. It doesn't take a PhD and MBA to figure out that the only reason Nvidia have, what should be a short term market available to them is the failings of Intel and AMD and the VC / Innovation side to offer any competition.

It is such an obvious win-win that it would probably be worth skipping the engineering and just announcing the product, for sale by the end of the year and force everyones hand.

prmoustache8mo ago

> The increase in the amount people would be willing to pay for a card with more VRAM is unambiguously more than the increase in the manufacturing cost.

I guess you already have the paper if it is that unambiguous. Would you mond sharing the data/source?

AnthonyMouse8mo ago

The cost of more pins is linear in the number of pins, and the pins aren't the only component of the manufacturing cost, so a card with twice as many pins will have a manufacturing cost of significantly less than twice that of a card with half as many pins.

Cards with 16GB of VRAM exist for ~$300 retail.

Cards with 80GB of VRAM cost >$15,000 and customers pay that.

A card with 80GB of VRAM could be sold for <$1500 with five times the margin of the $300 card because the manufacturing cost is less than five times as much. <$1500 is unambiguously a smaller number than >$15,000. QED.

doctorpangloss8mo ago

> the manufacturing cost is less than five times as much

They don’t manufacture the RAM. This isn’t complicated. They make less margin (a percentage) in your scenario. And that’s what Wall Street cares about.

1 more reply

singhrac8mo ago

This is almost true but not quite - I don't think much of the (dollar) spend on enterprise GPUs (H100, B200, etc.) would transfer if there was a 128 GB consumer card. The problem is both memory bandwidth (HBM) and networking (NVLink), which NVIDIA definitely uses to segment consumer vs enterprise hardware.

I think your argument is still true overall, though, since there are a lot of "gpu poors" (i.e. grad students) who write/invent in the CUDA ecosystem, and they often work in single card settings.

Fwiw Intel did try this with Arctic Sound / Ponte Vecchio, but it was late out the door and did not really perform (see https://chipsandcheese.com/p/intels-ponte-vecchio-chiplets-g...). It seems like they took on a lot of technical risk; hopefully some of that transfers over to a future project though Falcon Shores was cancelled. They really should should have released some of those chips even at a loss, but I don't know the cost of a tape out.

AnthonyMouse8mo ago

NVLink matters if you want to combine a whole bunch of GPUs, e.g. you need more VRAM than any individual GPU is available with. Many workloads exist that don't care about that or don't have working sets that large, particularly if the individual GPU actually has a lot of VRAM. If you need 128GB and you have GPUs with 40GB of VRAM then you need a fast interconnect. If you can get an individual GPU with 128GB, you don't.

There is also work being done to make this even less relevant because people are already interested in e.g. using four 16GB cards without a fast interconnect when you have a 64GB model. The simpler implementation of this is to put a quarter of the model on each card split in the order it's used and then have the performance equivalent of one card with 64GB of VRAM by only doing work on the card with that section of the data in its VRAM and then moving the (much smaller) output to the next card. A more sophisticated implementation does something similar but exploits parallelism by e.g. running four batches at once, each offset by a quarter, so that all the cards stay busy. Not all workloads can be split like this but for some of the important ones it works.

singhrac8mo ago

I think we might just disagree about how much of the GPU spend is on small vs large model (inference or training). I think it’s something like 99.9% of spending interest is on models that don’t fit into 128 GB (remember KV cache matters too). Happy to be proven wrong!

j / k navigate · click thread line to collapse

0 comments

AnthonyMouse8mo ago

Nvidia uses VRAM amount for market segmentation. They can't make a 128GB consumer card without cannibalizing their enterprise sales.

It's not clear what they're thinking by not offering this.

blitzarOP8mo ago

> Intel or AMD making an affordable high-VRAM card is win-win.

100% agree. CUDA is a bit of a moat, but the earlier in the hype cycle viable alternatives appear, the more likely the non CUDA ecosystem becomes viable.

> It's not clear what they're thinking by not offering this.

It is such an obvious win-win that it would probably be worth skipping the engineering and just announcing the product, for sale by the end of the year and force everyones hand.

prmoustache8mo ago

> The increase in the amount people would be willing to pay for a card with more VRAM is unambiguously more than the increase in the manufacturing cost.

I guess you already have the paper if it is that unambiguous. Would you mond sharing the data/source?

AnthonyMouse8mo ago

Cards with 16GB of VRAM exist for ~$300 retail.

Cards with 80GB of VRAM cost >$15,000 and customers pay that.

doctorpangloss8mo ago

> the manufacturing cost is less than five times as much

They don’t manufacture the RAM. This isn’t complicated. They make less margin (a percentage) in your scenario. And that’s what Wall Street cares about.

1 more reply

singhrac8mo ago

I think your argument is still true overall, though, since there are a lot of "gpu poors" (i.e. grad students) who write/invent in the CUDA ecosystem, and they often work in single card settings.

AnthonyMouse8mo ago

singhrac8mo ago

j / k navigate · click thread line to collapse