undefined | Better HN

0 pointssmokel1y ago0 comments

Running LLMs on that kind of hardware will be very slow (expect responses with only a few words per second, which is probably pretty annoying).

LM Studio [1] makes it very easy to run models locally and play with them. Llama 3.1 will only run in quantized form with 16GB RAM, and that cripples it quite badly, in my opinion.

You may try Phi-3 Mini, which has only 3.8B weights and can still do fun things.

[1] https://lmstudio.ai/

0 comments

2 comments · 2 top-level

wkat42421y ago

I don't find llama3.1 noticeably worse on 8 bit integer quantised than the original fp16 to be honest. It's also a lot faster.

Of course even then you're not going to reach the whole 128k context window on 16GB but if you don't need that it works great.

eth0up1y ago

Much appreciated. Thanks for this!

1 more reply

j / k navigate · click thread line to collapse