Nvidia Announces H100 NVL – Max Memory Server Card for Large Language Models (opens in new tab)

(anandtech.com)

122 pointsneilmovva3y ago107 comments

107 comments

A bit underwhelming - H100 was announced at GTC 2022, and represented a huge stride over A100. But a year later, H100 is still not generally available at any public cloud I can find, and I haven't yet seen ML researchers reporting any use of H100.

The new "NVL" variant adds ~20% more memory per GPU by enabling the sixth HBM stack (previously only five out of six were used). Additionally, GPUs now come in pairs with 600GB/s bandwidth between the paired devices. However, the pair then uses PCIe as the sole interface to the rest of the system. This topology is an interesting hybrid of the previous DGX (put all GPUs onto a unified NVLink graph), and the more traditional PCIe accelerator cards (star topology of PCIe links, host CPU is the root node). Probably not an issue, I think PCIe 5.0 x16 is already fast enough to not bottleneck multi-GPU training too much.

binarymax3y ago

It is interesting that hopper isn’t widely available yet.

I have seen some benchmarks from academia but nothing in the private sector.

I wonder if they thought they were moving too fast and wanted to milk amphere/ada as long as possible.

Not having any competition whatsoever means Nvidia can release what they like when they like.

pixl973y ago

The question is, do they not have much production, or is OpenAI and Microsoft buying every single one they produce?

TylerE3y ago

Why bother when you can get cryptobros paying way over MSRP for 3090s?

2 more replies

__anon-2023__3y ago

Yes, I was expecting a RAM-doubled edition of the H100, this is just a higher-binned version of the same part.

I got an email from vultr, saying that they're "officially taking reservations for the NVIDIA HGX H100", so I guess all public clouds are going to get those soon.

1 more reply

rerx3y ago

You can also join a pair of regular PCIe H100 GPUs with an NVLink bridge. So that topology is not so new either.

ksec3y ago

>H100 was announced at GTC 2022, and represented a huge stride over A100. But a year later, H100 is still not generally available at any public cloud I can find

You can safely assume an entity bought as many as they could.

ecshafer3y ago

I was wondering today if we would start to see the reverse of this. Small ASICS or some kind of optimized for LLM Gpu for desktop / or maybe even laptops of mobile. It is evident I think that LLM are here to stay and will be a major part of computing for a while. Getting this local, so we aren't reliant on clouds would be a huge boon for personal computing. Even if its a "worse" experience, being able to load up an LLM into our computer, tell it to only look at this directory and help out would be cool.

wyldfire3y ago

In fact, Qualcomm has announced a "Cloud AI" PCIe card designed for inference (as opposed to training & inference) [1, 2]. It's populated with NSPs like the ones in mobile SoCs.

[1] https://www.qualcomm.com/products/technology/processors/clou...

[2] https://github.com/quic/software-kit-for-qualcomm-cloud-ai-1...

ethbr03y ago

Software/hardware co-evolution. Wouldn't be the first time we went down that road to good effect.

For anything that can be run remotely, it'll always be deployed and optimized server-side first. Higher utilization means more economy.

Then trickle down to local and end user devices if it makes sense.

wmf3y ago

Apple, Intel, AMD, Qualcomm, Samsung, etc. already have "neural engines" in their SoCs. These engines continue to evolve to better support common types of models.

Sol-3y ago

Why is the sentiment here so much that LLMs will somehow be decentralized and run locally at some point? Has the story of the internet so far not been that centralization has pretty much always won?

wmf3y ago

Hackers want to run LLMs locally just because. It's not a mainstream thing.

capableweb3y ago

It makes business sense as well. It doesn't make much sense to build an entire company around the idea that OpenAI's APIs are always available and you won't eventually get screwed. "Be careful of basing your business on top of another" and all that yadda yadda.

If you want to build a business around LLMs, it makes a lot of sense to be able to run the core service of what you want to offer on your own infrastructure instead of rely on a 3rd party that most likely doesn't give more than 1% care about you.

1 more reply

jacquesm3y ago

Because that is pretty much the pendulum swinging in the IT world. Right now it is solidly in 'centralization' territory, hopefully it will go back towards decentralization again in the future. The whole PC revolution was an excellent datapoint for decentralization, now we're back to 'dumb terminals' but as local compute strengthens the things that you need a whole farm of servers for today can probably fit in your pocket tomorrow, or at the latest in a few years.

waboremo3y ago

Not sure this really tracks. Local compute has always been strengthening as a steady incline. Yet we haven't really experienced any sort of pendulum shift, it's always been centralization territory.

The reasoning seems mostly obvious to me here: people do not care for the effort that decentralization requires. If given the option to run AI off some website to generate all you want, people will gladly do this over using their local hardware due to the setup required.

The unfortunate part is that it takes so much longer to create not for profit tooling that is just as easy to use, especially when the calling to turn that into your for profit business in such a lucrative field is so tempting. Just ask the people who have contributed to Blender for a decade now.

1 more reply

cavisne3y ago

Nvidia's business model encourages this for starters. They charge a huge markup for their datacenter GPU's through some clever licensing restrictions. So it is cheaper per FLOP to run inference on a personal device.

Centralization of compute has not always won (even if that compute is mostly controlled by a single company). The failure of cloud gaming vs consoles, and the success of Apple (which is very centralized but pushes a lot of ML compute out to the edge) for example.

psychlops3y ago

I think the sentiment is both. There will be advanced centralized LLM's and people want the option to have a personal one (or two). There needn't be a single solution.

throwaway7433y ago

Sure, for big business, but torrents are still alive and well.

kaoD3y ago

I think it's because it feels more similar to Google Stadia than to Facebook.

011000113y ago

A couple of the big players are already looking at developing their own chips.

JonChesterfield3y ago

Have been for years. Maybe lots of years. It's expensive to have a go (many engineers plus cost of making the things) and it's difficult to beat the established players unless you see something they're doing wrong or your particular niche really cares about something the off the shelf hardware doesn't.

enlyth3y ago

Please give us consumer cards with more than 24GB VRAM, Nvidia.

It was a slap in the face when the 4090 had the same memory capacity as the 3090.

A6000 is 5000 dollars, ain't no hobbyist at home paying for that.

andrewstuart3y ago

Nvidia don't want consumers using consumer GPUs for business.

If you are a business user then you must pay Nvidia gargantuan amounts of money.

This is the outcome of a market leader with no real competition - you pay much more for lower power than the consumer GPUs and you are forced into ujsing their business GPUs through software license restrictions on the drivers.

Melatonic3y ago

That was always why the Titan line was so great - they typically unlocked features in between Quadro and Gaming cards. Sometimes it was subtle (like very good FP32 AND FP16 performance) or adding full 10 bit colour support if you had a Titan only. Now it seems like they have opened up even more of those features to consumer cards (at least the creative ones) with the studio drivers.

2 more replies

koheripbal3y ago

We're NOT business users, we just want to run our own LLM at home.

Given the size of LLMs, this should be possible with just a little bit of extra VRAM.

2 more replies

nullc3y ago

Nvidia can't do a large 'consumer' card without cannibalizing their commercial ML business. ATI doesn't have that problem.

ATI seems to be holding the idiot ball.

Port stable diffusion and clip to their hardware. Train an upsized version sized for a 48GB card. Release a prosumer 48gb card... get huge uptake from artists and creators using the tech.

andrewstuart3y ago

GPUs are going to be weird, underconfigured and overpriced until there is real competition.

Whether or not there is real competition depends entirely on whether Intels Arc line of GPUs stays in the market.

AMD strangely has decided not to compete. Its newest GPU the 7900 XTX is an extremely powerful card, close to the top of the line Nvidia RTX 4090 in raster performance.

If AMD had introduced it with an aggressively low price then then they could have wedged Nvidia, which is determinbed to exploit it's market dominance by squeezing the maximum money out of buyers.

Instead, AMD has decided to simply follow Nvidia in squeezing for maximum prices, with AM prices slightly behind Nvidia.

It's a strange decision from AMD who is well behind in market and apparently seems disinterested in increasing that market share by competing aggressively.

So a third player is needed - Intel - it's alot harder for three companies to sit on outrageously high prices for years rather than compete with each other for market share.

dragontamer3y ago

The root cause is that TSMC raised prices in everyone.

Since Intel GPUs are again TSMC manufactured, you really aren't going to see price improvements unless Intel subsidizes all of this.

andrewstuart3y ago

>> The root cause is that TSMC raised prices in everyone.

This is not correct.

1 more reply

enlyth3y ago

I suspect that the lack of CUDA is a dealbreaker for too many people when it comes to AMD, with the recent explosion in machine learning.

JonChesterfield3y ago

GPUs strike me as absurdly cheap given the performance they can offer. I'd just like them to be easier to program.

andrewstuart3y ago

Depends on the GPU of course but at the top end of the market AUD$3000 / USD$1,600 is not cheap and certainly not absurdly cheap.

Much less powerful GPUs represent better value but the market is ridiculously overpriced at the moment.

brucethemoose23y ago

The really interesting upcoming LLM products are from AMD and Intel... with catches.

- The Intel Falcon Shores XPU is basically a big GPU that can use DDR5 DIMMS directly, hence it can fit absolutely enormous models into a single pool. But it has been delayed to 2025 :/

- AMD have not mentioned anything about the (not delayed) MI300 supporting DIMMs. If it doesn't, its capped to 128GB, and its being marketed as an HPC product like the MI200 anyway (which you basically cannot find on cloud services).

Nvidia also has some DDR5 grace CPUs, but the memory is embedded and I'm not sure how much of a GPU they have. Other startups (Tenstorrent, Cerebras, Graphcore and such) seemed to have underestimated the memory requirements of future models.

YetAnotherNick3y ago

> DDR5 DIMMS directly

That's the problem. Good DDR5 RAM's memory speed is <100GB/s, while nvidia could has up to 2TB/s, and still the bottleneck lies on memory speed for most applications.

brucethemoose23y ago

Not if the bus is wide enough :P. EPYC Genoa is already ~450GB/s, and the M2 max is 400GB/s.

Anyway, what I was implying is that simply fitting a trillion parameter model into a single pool is probably more efficient than splitting it up over a power hungry interconnect. Bandwidth is much lower, but latency is also slower, you are shuffling much less data around.

virtuallynathan3y ago

Grace can be paired with Hopper via a 900GB/s NVLINK bus (500GB/s memory bandwidth), 1TB of LPDDR5 on the CPU and 80-94GB of HBM3 on the GPU.

brucethemoose23y ago

That does sound pretty good, but its still going chip to chip over NVLink.

int_19h3y ago

I wonder how soon we'll see something tailored specifically for local applications. Basically just tons of VRAM to be able to load large models, but not bleeding edge perf. And eGPU form factor, ideally.

frankchn3y ago

The Apple M-series CPUs with unified RAM is interesting in this regard. You can get an 16-inch MBP with an M2 Max 96GB of RAM for $4300 today, and I expect the M2 Ultra go to 192GB.

pixl973y ago

I'm not a ML scientist my any means, but Perf seems as important as RAM from what I'm reading. Running prompts in internal chain of thought (eating up more TPU time) appears to give much better output.

int_19h3y ago

It's not that perf is not important, but not having enough VRAM means you can't load the model of a given size at all.

I'm not saying they shouldn't bother with RAM at all, mind you. But given some target price, it's a balance thing between compute and RAM, and right now it seems that RAM is the bigger hurdle.

aliljet3y ago

I'm super duper curious if there are ways to glob together VRAM between consumer-grade hardware to make this whole market more accessible to the common hacker?

rerx3y ago

You can, for instance, connect two RTX 3090 with an NVLink bridge. That gives you 48 GB in total. The 4090 doesn't support NVLink anymore.

mk_stjames3y ago

You actually can split a model [0] onto multiple GPUs even without NVLink, just using the PCIe for the transfers.

Depending on the model the performance is sometimes not all that different. I believe for solely inference on some models the speed difference may barely be noticeable, where for other training activities it may make 10+% difference [1]

[0] https://pytorch.org/tutorials/intermediate/model_parallel_tu...

[1] https://huggingface.co/transformers/v4.9.2/performance.html

koheripbal3y ago

> The 4090 doesn't support NVLink anymore.

Are you sure about that?

homarp3y ago

that's what the press said: https://www.tomshardware.com/news/gigabyte-leaves-nvlink-tra...

bick_nyers3y ago

I remember reading about a guy who soldered 2GB VRAM modules on his 3060 12GB (replacing the 1GB modules) and was able to attain 24GB on that card. Or something along those lines.

metadat3y ago

How is this card (which is really two physical cards occupying 2 PCIe slots) exposed to the OS? Does it show up as a single /dev/gfx0 device, or is the unification a driver trick?

rerx3y ago

The two cards show as two distinct GPUs to the host, connected via NVLink. Unification / load balancing happens via software.

sva_3y ago

Kinda depressing if you consider how they removed NVLink in the 4090, stating the following reason:

> “The reason we took [NVLink] off is that we need I/O for other things, so we’re using that area to cram in as many AI processors as possible,” Jen-Hsun Huang explained of the reason for axing NVLink.[0]

"NVLink is bad for your games and AI, trust me bro."

But then this card, actually aimed at ML applications, uses it.

0. https://www.techgoing.com/nvidia-rtx-4090-no-longer-supports...

2 more replies

sargun3y ago

What exactly is an SXM5 socket? It sounds like a PCIe competitor, but proprietary to nvidia. Looking at it, it seems specific to nvidia DGX (mother?)boards. Is this just a "better" alternative to PCIe (with power delivery, and such), or fundamentally a new technology?

koheripbal3y ago

Yes to all your questions. It's specifically designed for commercial compute servers. It provides significantly more bandwidth and speed over PCIe.

It's also enormously more expensive and I'm not sure if you can buy it new without getting the nvidia compute server.

0xbadc0de53y ago

It's one of those /If you have to ask, you can't afford it/ scenarios.

tromp3y ago

The TDP row in the comparison table must be in error. It shows the card with dual GH100 GPUs at 700W and the one with a single GH100 GPU at 700-800W ?!

rerx3y ago

That's the SXM version, used for instance in servers like the DGX. It's also faster than the PCIe variation.

0xbadc0de53y ago

So it's essentially two H100's in a trenchcoat? (plus a sprinkling of "latest")

ipsum23y ago

I would sell a kidney for one of these. It's basically impossible to train language models on a consumer 24GB card. The jump up is the A6000 ADA, at 48GB for $8,000. This one will probably be priced somewhere in the $100k+ range.

YetAnotherNick3y ago

Use 4 consumer grade 4090 then. It would be much cheaper and better in almost every aspect. Also even with this, forget about training foundational models. Meta spent 82k GPU hours on the smallest llama and 1M hours on largest.

throwaway7433y ago

Go with 2x 3090s instead. 4000 series doesn't support SLI, so you're stuck with the max of whatever one card you get.

bick_nyers3y ago

If I remember correctly the NVLINK adds 100GB/s (where PCIE 4.0 is 64GB/s). Is it really worth getting 3090 performance (roughly half) for that extra bus speed?

2 more replies

solarmist3y ago

You think? It’s double 48 GB (per card) so why wouldn’t it be in the $20k range?

ipsum23y ago

Machine learning is so hyped right now (with good reason) so customers are price insensitive.

solarmist3y ago

I guess we'll see.

1 more reply

eliben3y ago

NVIDIA is selling shovels in a gold rush. Good for them. Their P/E of 150 is frightening, though.

jiggawatts3y ago

I was just saying to a colleague the day before this announcement that the inevitable consequence of the popularity of large language models will be GPUs with more memory.

Previously, GPUs were designed for gamers, and no game really "needs" more than 16 GB of VRAM. I've seen reviews of the A100 and H100 cards saying that the 80GB is ample for even the most demanding usage.

Now? Suddenly GPUs with 1 TB of memory could be immediately used, at scale, by deep-pocket customers happy to throw their entire wallets at NVIDIA.

This new H100 NVL model is a Frankenstein's monster stitched together from whatever they had lying around. It's a desperate move to corner the market early as possible. It's just the beginning, a preview of the times to come.

There will be a new digital moat, a new capitalist's empire, built upon on the scarcity of cards "big enough" to run models that nobody but a handful of megacorps can afford to train.

In fact, it won't be enough to restrict access by making the models expensive to train. The real moat will be models too expensive to run. Users will have to sign up, get API keys, and stand in line.

"Safe use of AI" my ass. Safe profits, more like. Safe monopolies, safe from competition.

g42gregory3y ago

I wonder how this compares to AMD Instinct MI300 128GB HBM3 cards?

tpmx3y ago

Does AMD have a chance here in the short term (say 24 months)?

Symmetry3y ago

AMD seems to be focusing on traditional HPC, they've got a ton of 64 bit flops in their recent commercial model. I expect their server GPUs are mostly for chasing supercomputer contracts, which can be pretty lucrative, while they cede model training to NVidia.

shubham-rawat53y ago

For now nvidia is a very dominant player for sure but in long run do you see it changing, with competition from Amd-xilinx, intel or potential AI hardware startups,why have the startups or other big players failed to make dent in nvidia's dominance ? considering how big this market will be in coming years there should have been significant investment made by other players but they seem to be incompetent in making even a competitive chip and nvidia which is already so ahead is running even more faster expanding its software ecosystem across various industries.

garbagecoder3y ago

Sarah Connor is totally coming for NVIDIA.

j / k navigate · click thread line to collapse

107 comments

neilmovvaOP3y ago

binarymax3y ago

It is interesting that hopper isn’t widely available yet.

I have seen some benchmarks from academia but nothing in the private sector.

I wonder if they thought they were moving too fast and wanted to milk amphere/ada as long as possible.

Not having any competition whatsoever means Nvidia can release what they like when they like.

pixl973y ago

The question is, do they not have much production, or is OpenAI and Microsoft buying every single one they produce?

TylerE3y ago

Why bother when you can get cryptobros paying way over MSRP for 3090s?

2 more replies

__anon-2023__3y ago

Yes, I was expecting a RAM-doubled edition of the H100, this is just a higher-binned version of the same part.

I got an email from vultr, saying that they're "officially taking reservations for the NVIDIA HGX H100", so I guess all public clouds are going to get those soon.

1 more reply

rerx3y ago

You can also join a pair of regular PCIe H100 GPUs with an NVLink bridge. So that topology is not so new either.

ksec3y ago

>H100 was announced at GTC 2022, and represented a huge stride over A100. But a year later, H100 is still not generally available at any public cloud I can find

You can safely assume an entity bought as many as they could.

ecshafer3y ago

wyldfire3y ago

In fact, Qualcomm has announced a "Cloud AI" PCIe card designed for inference (as opposed to training & inference) [1, 2]. It's populated with NSPs like the ones in mobile SoCs.

[1] https://www.qualcomm.com/products/technology/processors/clou...

[2] https://github.com/quic/software-kit-for-qualcomm-cloud-ai-1...

ethbr03y ago

Software/hardware co-evolution. Wouldn't be the first time we went down that road to good effect.

For anything that can be run remotely, it'll always be deployed and optimized server-side first. Higher utilization means more economy.

Then trickle down to local and end user devices if it makes sense.

wmf3y ago

Apple, Intel, AMD, Qualcomm, Samsung, etc. already have "neural engines" in their SoCs. These engines continue to evolve to better support common types of models.

Sol-3y ago

Why is the sentiment here so much that LLMs will somehow be decentralized and run locally at some point? Has the story of the internet so far not been that centralization has pretty much always won?

wmf3y ago

Hackers want to run LLMs locally just because. It's not a mainstream thing.

capableweb3y ago

1 more reply

jacquesm3y ago

waboremo3y ago

Not sure this really tracks. Local compute has always been strengthening as a steady incline. Yet we haven't really experienced any sort of pendulum shift, it's always been centralization territory.

1 more reply

cavisne3y ago

psychlops3y ago

I think the sentiment is both. There will be advanced centralized LLM's and people want the option to have a personal one (or two). There needn't be a single solution.

throwaway7433y ago

Sure, for big business, but torrents are still alive and well.

kaoD3y ago

I think it's because it feels more similar to Google Stadia than to Facebook.

011000113y ago

A couple of the big players are already looking at developing their own chips.

JonChesterfield3y ago

enlyth3y ago

Please give us consumer cards with more than 24GB VRAM, Nvidia.

It was a slap in the face when the 4090 had the same memory capacity as the 3090.

A6000 is 5000 dollars, ain't no hobbyist at home paying for that.

andrewstuart3y ago

Nvidia don't want consumers using consumer GPUs for business.

If you are a business user then you must pay Nvidia gargantuan amounts of money.

Melatonic3y ago

2 more replies

koheripbal3y ago

We're NOT business users, we just want to run our own LLM at home.

Given the size of LLMs, this should be possible with just a little bit of extra VRAM.

2 more replies

nullc3y ago

Nvidia can't do a large 'consumer' card without cannibalizing their commercial ML business. ATI doesn't have that problem.

ATI seems to be holding the idiot ball.

Port stable diffusion and clip to their hardware. Train an upsized version sized for a 48GB card. Release a prosumer 48gb card... get huge uptake from artists and creators using the tech.

andrewstuart3y ago

GPUs are going to be weird, underconfigured and overpriced until there is real competition.

Whether or not there is real competition depends entirely on whether Intels Arc line of GPUs stays in the market.

AMD strangely has decided not to compete. Its newest GPU the 7900 XTX is an extremely powerful card, close to the top of the line Nvidia RTX 4090 in raster performance.

If AMD had introduced it with an aggressively low price then then they could have wedged Nvidia, which is determinbed to exploit it's market dominance by squeezing the maximum money out of buyers.

Instead, AMD has decided to simply follow Nvidia in squeezing for maximum prices, with AM prices slightly behind Nvidia.

It's a strange decision from AMD who is well behind in market and apparently seems disinterested in increasing that market share by competing aggressively.

So a third player is needed - Intel - it's alot harder for three companies to sit on outrageously high prices for years rather than compete with each other for market share.

dragontamer3y ago

The root cause is that TSMC raised prices in everyone.

Since Intel GPUs are again TSMC manufactured, you really aren't going to see price improvements unless Intel subsidizes all of this.

andrewstuart3y ago

>> The root cause is that TSMC raised prices in everyone.

This is not correct.

1 more reply

enlyth3y ago

I suspect that the lack of CUDA is a dealbreaker for too many people when it comes to AMD, with the recent explosion in machine learning.

JonChesterfield3y ago

GPUs strike me as absurdly cheap given the performance they can offer. I'd just like them to be easier to program.

andrewstuart3y ago

Depends on the GPU of course but at the top end of the market AUD$3000 / USD$1,600 is not cheap and certainly not absurdly cheap.

Much less powerful GPUs represent better value but the market is ridiculously overpriced at the moment.

brucethemoose23y ago

The really interesting upcoming LLM products are from AMD and Intel... with catches.

- The Intel Falcon Shores XPU is basically a big GPU that can use DDR5 DIMMS directly, hence it can fit absolutely enormous models into a single pool. But it has been delayed to 2025 :/

YetAnotherNick3y ago

> DDR5 DIMMS directly

That's the problem. Good DDR5 RAM's memory speed is <100GB/s, while nvidia could has up to 2TB/s, and still the bottleneck lies on memory speed for most applications.

brucethemoose23y ago

Not if the bus is wide enough :P. EPYC Genoa is already ~450GB/s, and the M2 max is 400GB/s.

virtuallynathan3y ago

Grace can be paired with Hopper via a 900GB/s NVLINK bus (500GB/s memory bandwidth), 1TB of LPDDR5 on the CPU and 80-94GB of HBM3 on the GPU.

brucethemoose23y ago

That does sound pretty good, but its still going chip to chip over NVLink.

int_19h3y ago

frankchn3y ago

The Apple M-series CPUs with unified RAM is interesting in this regard. You can get an 16-inch MBP with an M2 Max 96GB of RAM for $4300 today, and I expect the M2 Ultra go to 192GB.

pixl973y ago

int_19h3y ago

It's not that perf is not important, but not having enough VRAM means you can't load the model of a given size at all.

I'm not saying they shouldn't bother with RAM at all, mind you. But given some target price, it's a balance thing between compute and RAM, and right now it seems that RAM is the bigger hurdle.

aliljet3y ago

I'm super duper curious if there are ways to glob together VRAM between consumer-grade hardware to make this whole market more accessible to the common hacker?

rerx3y ago

You can, for instance, connect two RTX 3090 with an NVLink bridge. That gives you 48 GB in total. The 4090 doesn't support NVLink anymore.

mk_stjames3y ago

You actually can split a model [0] onto multiple GPUs even without NVLink, just using the PCIe for the transfers.

[0] https://pytorch.org/tutorials/intermediate/model_parallel_tu...

[1] https://huggingface.co/transformers/v4.9.2/performance.html

koheripbal3y ago

> The 4090 doesn't support NVLink anymore.

Are you sure about that?

homarp3y ago

that's what the press said: https://www.tomshardware.com/news/gigabyte-leaves-nvlink-tra...

bick_nyers3y ago

I remember reading about a guy who soldered 2GB VRAM modules on his 3060 12GB (replacing the 1GB modules) and was able to attain 24GB on that card. Or something along those lines.

metadat3y ago

How is this card (which is really two physical cards occupying 2 PCIe slots) exposed to the OS? Does it show up as a single /dev/gfx0 device, or is the unification a driver trick?

rerx3y ago

The two cards show as two distinct GPUs to the host, connected via NVLink. Unification / load balancing happens via software.

sva_3y ago

Kinda depressing if you consider how they removed NVLink in the 4090, stating the following reason:

"NVLink is bad for your games and AI, trust me bro."

But then this card, actually aimed at ML applications, uses it.

0. https://www.techgoing.com/nvidia-rtx-4090-no-longer-supports...

2 more replies

sargun3y ago

koheripbal3y ago

Yes to all your questions. It's specifically designed for commercial compute servers. It provides significantly more bandwidth and speed over PCIe.

It's also enormously more expensive and I'm not sure if you can buy it new without getting the nvidia compute server.

0xbadc0de53y ago

It's one of those /If you have to ask, you can't afford it/ scenarios.

tromp3y ago

The TDP row in the comparison table must be in error. It shows the card with dual GH100 GPUs at 700W and the one with a single GH100 GPU at 700-800W ?!

rerx3y ago

That's the SXM version, used for instance in servers like the DGX. It's also faster than the PCIe variation.

0xbadc0de53y ago

So it's essentially two H100's in a trenchcoat? (plus a sprinkling of "latest")

ipsum23y ago

YetAnotherNick3y ago

throwaway7433y ago

Go with 2x 3090s instead. 4000 series doesn't support SLI, so you're stuck with the max of whatever one card you get.

bick_nyers3y ago

If I remember correctly the NVLINK adds 100GB/s (where PCIE 4.0 is 64GB/s). Is it really worth getting 3090 performance (roughly half) for that extra bus speed?

2 more replies

solarmist3y ago

You think? It’s double 48 GB (per card) so why wouldn’t it be in the $20k range?

ipsum23y ago

Machine learning is so hyped right now (with good reason) so customers are price insensitive.

solarmist3y ago

I guess we'll see.

1 more reply

eliben3y ago

NVIDIA is selling shovels in a gold rush. Good for them. Their P/E of 150 is frightening, though.

jiggawatts3y ago

I was just saying to a colleague the day before this announcement that the inevitable consequence of the popularity of large language models will be GPUs with more memory.

Now? Suddenly GPUs with 1 TB of memory could be immediately used, at scale, by deep-pocket customers happy to throw their entire wallets at NVIDIA.

There will be a new digital moat, a new capitalist's empire, built upon on the scarcity of cards "big enough" to run models that nobody but a handful of megacorps can afford to train.

In fact, it won't be enough to restrict access by making the models expensive to train. The real moat will be models too expensive to run. Users will have to sign up, get API keys, and stand in line.

"Safe use of AI" my ass. Safe profits, more like. Safe monopolies, safe from competition.

g42gregory3y ago

I wonder how this compares to AMD Instinct MI300 128GB HBM3 cards?

tpmx3y ago

Does AMD have a chance here in the short term (say 24 months)?

Symmetry3y ago

shubham-rawat53y ago

garbagecoder3y ago

Sarah Connor is totally coming for NVIDIA.

j / k navigate · click thread line to collapse