undefined | Better HN

Skip to content

Top Best Ask Show New Jobs

0 pointskulahan1y ago0 comments

I'm very happy to hear this; maybe it's finally time to buy a ton of ram for my PC! A local, private LLM would be great. I'd try talking to it about stuff I don't feel comfortable being on OpenAI's servers.

0 comments

4 comments · 1 top-level

lysace1y ago· 3 in thread

Getting lots of ram will let you run large models on the CPU, but it will be so slow.

The Apple Silicon Macs have this shared memory between CPU and GPU that let's the (relatively underpowered GPU, compared to a decent Nvidia GPU) run these models at decent speeds, compared with a CPU, when using llama.cpp.

This should all get dramatically better/faster/cheaper within a few years, I suspect. Capitalism will figure this one out.

kulahanOP1y ago

Interesting, so this is a Mac-specific solution? That's pretty cool.

I assume, then, that the primary goal would be to drop in the beefiest GPU possible when on windows/linux?

There's nothing Mac specific about running LLMs locally, they just happen to be a convenient way to get a ton of VRAM in a single small power efficient package.

In Windows and Linux, yes you'll want at least 12GB of VRAM to have much of any utility but the beefiest consumer GPUs are still topping out at 24GB which is still pretty limiting.

lysace1y ago

With Windows/Linux I think the issue is that NVidia is artificially limiting the amount of onboard RAM (they want to sell those devices for 10x more to openai, etc) and that AMD for whatever reason can't get their shit together.

I'm sure that there are other much more knowledgeable people here though, on this topic.

j / k navigate · click thread line to collapse