undefined | Better HN

0 pointsTerretta1y ago0 comments

Compare performance on various Macs here as it gets updated:

https://github.com/ggerganov/llama.cpp/discussions/4167

OMM, Llama 3.3 70B runs at ~7 text generation tokens per second on Macbook Pro Max 128GB, while generating GPT-4 feeling text with more in depth responses and fewer bullets. Llama 3.3 70B also doesn't fight the system prompt, it leans in.

Consider e.g. LM Studio (0.3.5 or newer) for a Metal (MLX) centered UI, include MLX in your search term when downloading models.

Also, do not scrimp on the storage. At 60GB - 100GB per model, it takes a day of experimentation to use 2.5TB of storage in your model cache. And remember to exclude that path from your TimeMachine backups.

0 pointsTerretta1y ago0 comments

Compare performance on various Macs here as it gets updated:

https://github.com/ggerganov/llama.cpp/discussions/4167

Consider e.g. LM Studio (0.3.5 or newer) for a Metal (MLX) centered UI, include MLX in your search term when downloading models.

0 comments

8 comments · 2 top-level

ant6n1y ago· 5 in thread

What if you have a Macbook Air with 16GB (the bechmarks dont seem to show memory).

simonw1y ago

You could definitely run an 8B model on that, and some of those are getting very capable now.

The problem is that often you can't run anything else. I've had trouble running larger models in 64GB when I've had a bunch of Firefox and VS Code tabs open at the same time.

xdavidliu1y ago

I thought VSCode was supposed to be lightweight, though I suppose with extensions it can add up

evilduck1y ago

8B models with larger contexts, or even 9-14B parameter models quantized.

Qwen2.5 Coder 14B at a 4 bit quantization could run but you will need to be diligent about what else you have in memory at the same time.

chris_st1y ago

I have a M2 Air with 24GB, and have successfully run some 12B models such as mistral-nemo. Had other stuff going as well, but it's best to give it as much of the machine as possible.

gcanyon1y ago

I recently upgraded to exactly this machine for exactly this reason, but I haven't taken the leap and installed anything yet. What's your favorite model to run on it?

noman-land1y ago· 1 in thread

Thank you for all the tips! I'd probably go 128GB 8TB because of masochism. Curious, what makes so many of the M4s in the red currently.

vessenes1y ago

It's all memory bandwidth related -- what's slow is loading these models into memory, basically. The last die from Apple with all the channels was the M2 Ultra, and I bet that's what tops those leader boards. M4 has not had a Max or an Ultra release yet; when it does (and it seems likely it will), those will be the ones to get.

j / k navigate · click thread line to collapse