undefined | Better HN

0 pointsbigyabai1mo ago0 comments

> Why didn't you take into account [...] the fact that a laptop can still hold a decent % of its resale value, and is useful for many other tasks than running an LLM?

Because that wasn't what they claimed to research?

  >> for inference it's definitely not worth it.

It's entirely fine if you enjoy local LLMs on your computer, there are people doing horribly inefficient inference on smartphones now. But for pure inference tasks, it's pretty obvious why M5s and Mac Studios aren't replacing TPUs and GPUs.

0 comments

6 comments · 1 top-level

joefourier1mo ago· 5 in thread

Who is going to buy a $4299 M5 Max MBP with 64GB of RAM just to run Gemma 4 31b? Firstly you don't need 64GB for that model. Secondly if you want a machine that sits in the corner and does nothing but LLM inference, you don't buy a MacBook Pro, you buy some GPUs which are going to cost you a fraction of that (~$1k for ~64GB of VRAM is possible). The people buying Apple Silicon for inference general aim for the Mac Studios with enormous amounts of RAM (128-512GB), to run very large models.

The idea is obviously to be running the LLM on your work laptop. As a developer I'd need a laptop with 24GB of RAM for work anyway, and 48GB, which is enough for a very good quant of Gemini, is just $400 extra.

gbgarbeb1mo ago

24GB GPUs are $700-2500. Please show me the 64GB GPU for $1k.

joefourier1mo ago

Not a single new 64GB GPU, but multiple used GPUs.

They’ve significantly increased in price (so much for hardware depreciation…) but you can still get a modded 22GB 2080 ti for $320, or a Mi50 32GB for ~$450 each (used to be $150 a few months ago, alas), or a Mi50 16GB or <$200 but you’d need to stack 4 of them.

There’s also some more exotic configurations but those are probably the simplest options. You won’t get the performance of an RTX Pro 6000 Blackwell of course, and the power consumption will be pretty high so it’s only worth it if you have cheap electricity. But it is possible.

1 more reply

zozbot2341mo ago

> Firstly you don't need 64GB for that model.

You might need that to run it with a longer context, KV cache size is a known issue with that model series.

vardump1mo ago

> Gemma 4 31b? Firstly you don't need 64GB for that model.

You don't? It for sure doesn't run on my 32 GB M2 MAX.

joefourier1mo ago

What quant? You should have no problem running it at Q4 with 256K context, Q5 or Q6 even although maybe not at full context. I can run Q4 on a 4090 with just 24GB VRAM.

j / k navigate · click thread line to collapse

0 comments

6 comments · 1 top-level

joefourier1mo ago· 5 in thread

gbgarbeb1mo ago

24GB GPUs are $700-2500. Please show me the 64GB GPU for $1k.

joefourier1mo ago

Not a single new 64GB GPU, but multiple used GPUs.

1 more reply

zozbot2341mo ago

> Firstly you don't need 64GB for that model.

You might need that to run it with a longer context, KV cache size is a known issue with that model series.

vardump1mo ago

> Gemma 4 31b? Firstly you don't need 64GB for that model.

You don't? It for sure doesn't run on my 32 GB M2 MAX.

joefourier1mo ago

What quant? You should have no problem running it at Q4 with 256K context, Q5 or Q6 even although maybe not at full context. I can run Q4 on a 4090 with just 24GB VRAM.

j / k navigate · click thread line to collapse