Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
canpan
1mo ago
0 comments
Save
Share
Llama.cpp with automatic offload to main memory. You can also use Ollama, it is easier, but slower.
0 comments
1 comments · 1 top-level
top
newest
oldest
reverius42
1mo ago
For those who want a GUI, LM Studio does this too (with llama.cpp as the backend I think). I'm getting great (albeit slow) results with Qwen3.6-35B MoE on 8GB GPU RAM, 40GB system RAM.
j
/
k
navigate · click thread line to collapse