undefined | Better HN

0 pointsarcanemachiner1y ago0 comments

Setting up Ollama via Docker was the easiest way for me to get up and running. Not 100% sure if it fits your constraints, but highly recommended.

0 comments

2 comments · 1 top-level

programd1y ago· 1 in thread

Another option is to download and compile llama.cpp and you should be able to run quantized models at an acceptable speed.

Also, if you can spend the $60 and buy another 32GB of RAM, this will allow you to run the 30GB models quite nicely.

Unfortunately motherboard is capped at 16Gb ram

j / k navigate · click thread line to collapse