undefined | Better HN

0 pointscjbprime2y ago0 comments

Wouldn't expect that to work at all.

0 comments

1 comments · 1 top-level

Ollama (which wraps llama.cpp) supports splitting a model across devices so you get some acceleration even on models too big to fit entirely in GPU memory.

j / k navigate · click thread line to collapse