PyTorch Library for Running LLM on Intel CPU and GPU (opens in new tab)

(github.com)

308 pointsebalit2y ago95 comments

95 comments

The company that did 4-cores-forever, has the opportunity to redeem itself, in its next consumer GPU release, by disrupting the "8-16GB VRAM forever" that AMD and Nvidia have been imposing on us for a decade. It would be poetic to see 32-48GB at a non-eye-watering price point.

Intel definitely seems to be doing all the right things on software support.

riskable2y ago

No kidding... Intel is playing catch-up with Nvidia in the AI space and a big reason for that is their offerings aren't competitive. You can get an Intel Arc A770 with 16GB of VRAM (which was released in October, 2022) for about $300 or an Nvidia 4060 Ti with 16GB of VRAM for ~$500 which is twice as fast for AI workloads in reality (see: https://cdn.mos.cms.futurecdn.net/FtXkrY6AD8YypMiHrZuy4K-120... )

This is a huge problem because in theory the Arc A770 is faster! It's theoretical performance (TFLOPS) is more than twice as fast as an Nvidia 4060 (see: https://cdn.mos.cms.futurecdn.net/Q7WgNxqfgyjCJ5kk8apUQE-120... ). So why does it perform so poorly? Because everything AI-related has been developed and optimized to run on Nvidia's CUDA.

Mostly, this is a mindshare issue. If Intel offered a workstation GPU (i.e. not a ridiculously expensive "enterprise" monster) that developers could use that had something like 32GB or 64GB of VRAM it would sell! They'd sell zillions of them! In fact, I'd wager that they'd be so popular it'd be hard for consumers to even get their hands on one because it would sell out everywhere.

It doesn't even need to be the fastest card. It just needs to offer more VRAM than the competition. Right now, if you want to do things like training or video generation the lack of VRAM is a bigger bottleneck than the speed of the GPU. How does Intel not see this‽ They have the power to step up and take over a huge section of the market but instead they're just copying (poorly) what everyone else is doing.

Workaccount22y ago

Based on leaks, it looks like intel somehow missed an easy opportunity here. There is an insane demand for high VRAM cards now, and it seems the next intel cards will be 12GB.

Intel, screw everything else, just pack as much VRAM in those as you can. Build it and they will come.

dheera2y ago

Exactly, I'd love to have 1TB of RAM that can be accessed at 6000 MT/s.

1 more reply

ponector2y ago

I don't agree. Who will buy it? A few enthusiasts who wants to run LLM locally but cannot afford M3 or 4090?

It will be a niche product with poor sales.

bt1a2y ago

I think there's more than a few enthusiasts who would be very interesting in buying 1 or more of these cards (if they had 32+ GB of memory), but I don't have any data to back that opinion up. It is not only those who can't afford a 4090 though.

While the 4090 can run models that use less than 24GB of memory at blistering speeds, models are going to continue to scale up and 24GB is fairly limiting. Because LLM inference can take advantage of splitting the layers among multiple GPUs, high memory GPUs that aren't super expensive are desirable.

To share a personal perspective, I have a desktop with a 3090 and an M1 Max Studio with 64GB of memory. I use the M1 for local LLMs because I can use up to 57~GB of memory, even though the output (in terms of tok/s) is much slower than ones I can fit on a 3090.

2 more replies

loudmax2y ago

I tend to agree that it would be niche. The machine learning enthusiast market is far smaller than the gamer market.

But selling to machine learning enthusiasts is not a bad place to be. A lot of these enthusiasts are going to go on to work at places that are deploying enterprise AI at scale. Right now, almost all of their experience is CUDA and they're likely to recommend hardware they're familiar with. By making consumer Intel GPUs attractive to ML enthusiasts, Intel would make their enterprise GPUs much more interesting for enterprise.

3 more replies

Aerroon2y ago

It's about mindshare. Random people using your product to do AI means that the tooling is going to improve because people will try to use them. But as it stands right now if you think there's any chance you want to use AI in the next 5 years, then why would you buy anything other than Nvidia?

It doesn't even matter if that's your primary goal or not.

talldayo2y ago

> Who will buy it?

Frustrated AMD customers willing to put their money where their mouth is?

resource_waste2y ago

>M3

>4090

These are noob hardware. A6000 is my choice.

Which really only further emphesizes your point.

>CPU based is a waste of everyone's time/effort

>GPU based is 100% limited by VRAM, and is what you are realistically going to use.

jmward012y ago

Microsoft got where they are because the developed tools that everyone used. The got the developers and the consumers followed. Intel (or AMD) could do the same thing. Get a big card with lost of ram so that the developers get used to your ecosystem and then sell the enterprise GPUs to make the $$$. It is a clear path with a lot of history and it blows my mind Intel and AMD aren't doing it.

1 more reply

alecco2y ago

AFAIK, unless you are a huge American corp with orders above $100m Nvidia will only sell you old and expensive server cards like the crappy A40 PCIe 4.0 48GB GDDR6 at $5,000. Good luck getting SXM H100s or GH200.

If Intel sells a stackable kit with a lot of RAM and a reasonable interconnect a lot of corporate customers will buy. It doesn't even have to be that good, just half way between PCIe 5.0 and NVLink.

But it seems they are still too stuck in their old ways. I wouldn't count on them waking up. Nor AMD. It's sad.

1 more reply

glitchc2y ago

I think the answer to that is fairly straightforward. Intel isn't in the business of producing RAM. They would have to buy and integrate a third-party product which is likely not something their business side has ever contemplated as a viable strategy.

monocasa2y ago

Their GPUs as sold already include RAM.

1 more reply

actionfromafar2y ago

I wonder what the market dynamics for NVidia would be if Intel bought as much VRAM as it could.

chessgecko2y ago

Going above 24GB is probably not going to be cheap until gddr7 is out, and even that will only push it to 36gb. The fancier stacked gddr6 stuff is probably pretty expensive and you can’t just add more dies because of signal integrity issues.

frognumber2y ago

Assuming you want to maintain full bandwidth.

Which I don't care too much about.

However, even 16->24GB is a big step, since a lot of the model are developed for 3090/4090-class hardware. 36GB would place it lose to the class of the fancy 40GB data center cards.

If Intel decided to push VRAM, it will definitely have a market. Critically, a lot of folks will also be incentivized to make software compatible, since it will be the cheapest way to run models.

0cf8612b2e1e2y ago

At this point, I cannot run an entire class of models without OOM. I will take a performance hit if it lets me run it at all.

I want a consumer card that can do some number of tokens per second. I do not need a monster that can serve as the basis for a startup.

1 more reply

rnewme2y ago

How comes you don't care about full bandwidth?

2 more replies

sitkack2y ago

What is obvious to us, is an industry standard to Product Managers. When is the last time you have seen an industry player upset the status quo? Intel has not changed that much.

zoobab2y ago

"It would be poetic to see 32-48GB at a non-eye-watering price point."

I heard some Asrock motherboard BIOSes could set the VRAM up to 64GB on Ryzen5.

Doing some investigations with different AMD hardware atm.

stefanka2y ago

That would be an interesting information. Which MB works with with which APU with 32 or more GB of VRAM. Can you post your findings please?

LoganDark2y ago

When has an APU ever been as fast as a GPU? How much cache does it have, a few hundred megabytes? That can't possibly be enough for matmul, no matter how much slow DDR4/5 is technically addressable.

zoobab2y ago

"APU ever been as fast as a GPU"

Ryzen5 has both CPU+GPU on one chip, the BIOS allows you set the amount of VRAM. They share the same RAM bank, you can set 16GB of VRAM and 16GB for the OS if you use a 32GB RAM bank.

1 more reply

belter2y ago

AMD making drivers of high quality? I would pay to see that :-)

haunter2y ago

First crypto then AI, I wish GPUs were left alone for gaming.

talldayo2y ago

Are there actually gamers out there that are still struggling to source GPUs? Even at the height of the mining craze, it was still possible to backorder cards at MSRP if you're patient.

The serious crypto and AI nuts are all using custom hardware. Crypto moved onto ASICs for anything power-efficient, and Nvidia's DGX systems aren't being cannibalized from the gaming market.

azinman22y ago

Didn’t nvidia try to block this in software by slowing down mining?

Seems like we just need consumer matrix math cards with literally no video out, and then a different set of requirements for those with a video out.

wongarsu2y ago

But Nvidia doesn't want to make consumer compute cards because those might steal market share from the datacenter compute cards they are selling at 5x markup.

2 more replies

baq2y ago

They were.

But then those pesky researchers and hackers figured out how to use the matmul hardware for non-gaming.

OkayPhysicist2y ago

The issue from the manufacturer's perspective is that they've got two different customer bases with wildly different willingness to pay, but not substantially different needs from their product. If Nvidia and AMD didn't split the two markets somehow, then there would be no cards available to the PC market, since the AI companies with much deeper pockets would buy up the lot. This is undesirable from the manufacturer's perspective for a couple reasons, but I suspect a big one is worries that the next AI winter would cause their entire business to crater out, whereas the PC market is pretty reliable for the foreseeable future.

Right now, the best discriminator they have is that PC users are willing to put up with much smaller amounts of VRAM.

UncleOxidant2y ago

> Intel definitely seems to be doing all the right things on software support.

Can you elaborate on this? Intel's reputation for software support hasn't been stellar, what's changed?

whalesalad2y ago

still wondering why we can't have gpu's with sodimm slots so you can crank the vram

amir_karbasi2y ago

I believe that the issue is that graphic cards require really fast memory. This requires close memory placement (that's why the memory is so close to the core on the board). expandable memory will not be able to provide the required bandwidth here.

frognumber2y ago

The universe used to have hierarchies. Fast memory close, slow memory far. Registers. L1. L2. L3. RAM. Swap.

The same thing would make a lot of sense here. Super-fast memory close, with overflow into classic DDR slots.

As a footnote, going parallel also helps. 8 sticks of RAM at 1/8 the bandwidth each is the same as one stick of RAM at 8x the bandwidth, if you don't multiplex onto the same traces.

2 more replies

magicalhippo2y ago

Isn't part of the problem that the connectors add too much inductance, making the lines difficult to drive at high speed? Similar issue to distance I suppose but more severe.

riskable2y ago

You can do this sort of thing but you can't use SODIMM slots because that places the actual memory chips too far away from the GPU. Instead what you need is something like BGA sockets (https://www.nxp.com/design/design-center/development-boards/... ) which are stupidly expensive (e.g. $600 per socket).

monocasa2y ago

You could probably use something like CAMM which solved a similar problem for lpddr.

https://en.wikipedia.org/wiki/CAMM_(memory_module)

justsomehnguy2y ago

Look at the motherboards with >2 Memory channels. That would require a lot of physical space, which is quite restricted on a 50 y/o standard for the expansion cards.

chessgecko2y ago

You could, but the memory bandwidth wouldn’t be amazing unless you had a lot of sticks and it would end up getting pretty expensive

Hugsun2y ago

I'd be interested in seeing benchmark data. The speed seemed pretty good in those examples.

captaindiego2y ago

Are there any Intel GPUs with a lot of vRAM that someone could recommend that would work with this?

Aromasin2y ago

There's the Max GPU (Ponte Vecchio), their datacentre offering, with 128GB of HBM2e memory, 408 MB of L2 cache, and 64 MB of L1 cache. Then there's Gaudi, which has similar numbers but with cores specific for AI workloads (as far as I know from the marketing).

You can pick them up in prebuilds from Dell and Supermicro: https://www.supermicro.com/en/accelerators/intel

goosedragons2y ago

For consumer stuff there's the Intel Arc A770 with 16GB VRAM. More than that and you start moving into enterprise stuff.

ZeroCool2u2y ago

Which seems like their biggest mistake. If they would just release a card with more than 24GB VRAM, people would be clamoring for their cards, even if they were marginally slower. It's the same reason that 3090's are still in high demand compared to the 4090's.

DrNosferatu2y ago

Any performance benchmark against 'llamafile'[0] or others?

[0] - https://github.com/mozilla-Ocho/llamafile

VHRanger2y ago

You can already use intel GPUs (both ARC and iGPUS) with llama.cpp on a bunch of backends:

- SYCL [1]

- Vulkan

- OpenCL

I don't own the hardware, but I imagine SYCL is more performant for ARC , because it's the one intel is pushing for their datacenter stuff

[1]: https://www.intel.com/content/www/us/en/developer/articles/t...

donnygreenberg2y ago

Would be nice if this came with scripts which could launch the examples on compatible GPUs on cloud providers (rather than trying to guess?). Would anyone else be interested in that? Considering putting it together.

antonp2y ago

Hm, no major cloud provider offers intel gpus.

belthesar2y ago

Intel GPUs got quite a bit of penetration in the SE Asian market, and Intel is close to releasing a new generation. In addition, Intel's allowing for GPU virtualization without additional license fees (unlike Nvidia and GRID licenses), allowing hosting operators to carve up these cards. I have a feeling we're going to see a lot more Intel offerings available.

VHRanger2y ago

No, but for consumers they're a great offering.

16GB RAM and performance around a 4060ti or so, but for 65% of the price

_joel2y ago

and 65% of the software support, less I'm inclined to believe? Although having more players in the fold is definitely a good thing.

VHRanger2y ago

Intel is historically really good at the software side, though.

For all their hardware research hiccups in the last 10 years, they've been delivering on open source machine learning libraries.

It's apparently the same on driver improvements and gaming GPU features in the last year.

2 more replies

anentropic2y ago

Lots offer Intel CPUs though...

tomrod2y ago

Looking forward to reviewing!

j / k navigate · click thread line to collapse

95 comments

vegabook2y ago

Intel definitely seems to be doing all the right things on software support.

riskable2y ago

Workaccount22y ago

Based on leaks, it looks like intel somehow missed an easy opportunity here. There is an insane demand for high VRAM cards now, and it seems the next intel cards will be 12GB.

Intel, screw everything else, just pack as much VRAM in those as you can. Build it and they will come.

dheera2y ago

Exactly, I'd love to have 1TB of RAM that can be accessed at 6000 MT/s.

1 more reply

ponector2y ago

I don't agree. Who will buy it? A few enthusiasts who wants to run LLM locally but cannot afford M3 or 4090?

It will be a niche product with poor sales.

bt1a2y ago

2 more replies

loudmax2y ago

I tend to agree that it would be niche. The machine learning enthusiast market is far smaller than the gamer market.

3 more replies

Aerroon2y ago

It doesn't even matter if that's your primary goal or not.

talldayo2y ago

> Who will buy it?

Frustrated AMD customers willing to put their money where their mouth is?

resource_waste2y ago

>M3

>4090

These are noob hardware. A6000 is my choice.

Which really only further emphesizes your point.

>CPU based is a waste of everyone's time/effort

>GPU based is 100% limited by VRAM, and is what you are realistically going to use.

jmward012y ago

1 more reply

alecco2y ago

If Intel sells a stackable kit with a lot of RAM and a reasonable interconnect a lot of corporate customers will buy. It doesn't even have to be that good, just half way between PCIe 5.0 and NVLink.

But it seems they are still too stuck in their old ways. I wouldn't count on them waking up. Nor AMD. It's sad.

1 more reply

glitchc2y ago

monocasa2y ago

Their GPUs as sold already include RAM.

1 more reply

actionfromafar2y ago

I wonder what the market dynamics for NVidia would be if Intel bought as much VRAM as it could.

chessgecko2y ago

frognumber2y ago

Assuming you want to maintain full bandwidth.

Which I don't care too much about.

However, even 16->24GB is a big step, since a lot of the model are developed for 3090/4090-class hardware. 36GB would place it lose to the class of the fancy 40GB data center cards.

If Intel decided to push VRAM, it will definitely have a market. Critically, a lot of folks will also be incentivized to make software compatible, since it will be the cheapest way to run models.

0cf8612b2e1e2y ago

At this point, I cannot run an entire class of models without OOM. I will take a performance hit if it lets me run it at all.

I want a consumer card that can do some number of tokens per second. I do not need a monster that can serve as the basis for a startup.

1 more reply

rnewme2y ago

How comes you don't care about full bandwidth?

2 more replies

sitkack2y ago

What is obvious to us, is an industry standard to Product Managers. When is the last time you have seen an industry player upset the status quo? Intel has not changed that much.

zoobab2y ago

"It would be poetic to see 32-48GB at a non-eye-watering price point."

I heard some Asrock motherboard BIOSes could set the VRAM up to 64GB on Ryzen5.

Doing some investigations with different AMD hardware atm.

stefanka2y ago

That would be an interesting information. Which MB works with with which APU with 32 or more GB of VRAM. Can you post your findings please?

LoganDark2y ago

When has an APU ever been as fast as a GPU? How much cache does it have, a few hundred megabytes? That can't possibly be enough for matmul, no matter how much slow DDR4/5 is technically addressable.

zoobab2y ago

"APU ever been as fast as a GPU"

Ryzen5 has both CPU+GPU on one chip, the BIOS allows you set the amount of VRAM. They share the same RAM bank, you can set 16GB of VRAM and 16GB for the OS if you use a 32GB RAM bank.

1 more reply

belter2y ago

AMD making drivers of high quality? I would pay to see that :-)

haunter2y ago

First crypto then AI, I wish GPUs were left alone for gaming.

talldayo2y ago

Are there actually gamers out there that are still struggling to source GPUs? Even at the height of the mining craze, it was still possible to backorder cards at MSRP if you're patient.

The serious crypto and AI nuts are all using custom hardware. Crypto moved onto ASICs for anything power-efficient, and Nvidia's DGX systems aren't being cannibalized from the gaming market.

azinman22y ago

Didn’t nvidia try to block this in software by slowing down mining?

Seems like we just need consumer matrix math cards with literally no video out, and then a different set of requirements for those with a video out.

wongarsu2y ago

But Nvidia doesn't want to make consumer compute cards because those might steal market share from the datacenter compute cards they are selling at 5x markup.

2 more replies

baq2y ago

They were.

But then those pesky researchers and hackers figured out how to use the matmul hardware for non-gaming.

OkayPhysicist2y ago

Right now, the best discriminator they have is that PC users are willing to put up with much smaller amounts of VRAM.

UncleOxidant2y ago

> Intel definitely seems to be doing all the right things on software support.

Can you elaborate on this? Intel's reputation for software support hasn't been stellar, what's changed?

whalesalad2y ago

still wondering why we can't have gpu's with sodimm slots so you can crank the vram

amir_karbasi2y ago

frognumber2y ago

The universe used to have hierarchies. Fast memory close, slow memory far. Registers. L1. L2. L3. RAM. Swap.

The same thing would make a lot of sense here. Super-fast memory close, with overflow into classic DDR slots.

As a footnote, going parallel also helps. 8 sticks of RAM at 1/8 the bandwidth each is the same as one stick of RAM at 8x the bandwidth, if you don't multiplex onto the same traces.

2 more replies

magicalhippo2y ago

Isn't part of the problem that the connectors add too much inductance, making the lines difficult to drive at high speed? Similar issue to distance I suppose but more severe.

riskable2y ago

monocasa2y ago

You could probably use something like CAMM which solved a similar problem for lpddr.

https://en.wikipedia.org/wiki/CAMM_(memory_module)

justsomehnguy2y ago

Look at the motherboards with >2 Memory channels. That would require a lot of physical space, which is quite restricted on a 50 y/o standard for the expansion cards.

chessgecko2y ago

You could, but the memory bandwidth wouldn’t be amazing unless you had a lot of sticks and it would end up getting pretty expensive

Hugsun2y ago

I'd be interested in seeing benchmark data. The speed seemed pretty good in those examples.

captaindiego2y ago

Are there any Intel GPUs with a lot of vRAM that someone could recommend that would work with this?

Aromasin2y ago

You can pick them up in prebuilds from Dell and Supermicro: https://www.supermicro.com/en/accelerators/intel

goosedragons2y ago

For consumer stuff there's the Intel Arc A770 with 16GB VRAM. More than that and you start moving into enterprise stuff.

ZeroCool2u2y ago

DrNosferatu2y ago

Any performance benchmark against 'llamafile'[0] or others?

[0] - https://github.com/mozilla-Ocho/llamafile

VHRanger2y ago

You can already use intel GPUs (both ARC and iGPUS) with llama.cpp on a bunch of backends:

- SYCL [1]

- Vulkan

- OpenCL

I don't own the hardware, but I imagine SYCL is more performant for ARC , because it's the one intel is pushing for their datacenter stuff

[1]: https://www.intel.com/content/www/us/en/developer/articles/t...

donnygreenberg2y ago

antonp2y ago

Hm, no major cloud provider offers intel gpus.

belthesar2y ago

VHRanger2y ago

No, but for consumers they're a great offering.

16GB RAM and performance around a 4060ti or so, but for 65% of the price

_joel2y ago

and 65% of the software support, less I'm inclined to believe? Although having more players in the fold is definitely a good thing.

VHRanger2y ago

Intel is historically really good at the software side, though.

For all their hardware research hiccups in the last 10 years, they've been delivering on open source machine learning libraries.

It's apparently the same on driver improvements and gaming GPU features in the last year.

2 more replies

anentropic2y ago

Lots offer Intel CPUs though...

tomrod2y ago

Looking forward to reviewing!

j / k navigate · click thread line to collapse