Don't have a $5k MacBook to run LLAMA65B? MiniLLM runs LLMs on GPUs in <500 LOC (opens in new tab)

(github.com)

3 pointsvolodia3y ago2 comments

2 comments

Doesn't this use as much VRAM as llama.cpp (with int4 models) uses RAM? RAM is a lot cheaper than VRAM.

It won't run as fast on your CPU at it will run on a GPU. Also, it might clog most of your RAM; it's better to offload to a cheap GPU.

j / k navigate · click thread line to collapse

Doesn't this use as much VRAM as llama.cpp (with int4 models) uses RAM? RAM is a lot cheaper than VRAM.

volodiaOP3y ago

It won't run as fast on your CPU at it will run on a GPU. Also, it might clog most of your RAM; it's better to offload to a cheap GPU.

j / k navigate · click thread line to collapse