undefined | Better HN

0 pointssipjca9d ago0 comments

fwiw because of the relatively few activated params offloading to system RAM is quite feasible, you can see the endless amount of people doing this on r/localllama with qwen3.6 35a3b

0 comments

3 comments · 1 top-level

bitwize9d ago· 2 in thread

I ran Gemma4 26B A4B on an 8yo PC with a fucking GTX and it did rather well.

doodlesdev9d ago

Well, that's pretty impressive. Care to share your setup to do that? How much DDR3/DDR4 do you have, too?

bitwize9d ago

I... downloaded a 4-bit quantized GGUF of the model, used llama.cpp to run it, and pointed OpenCode at that. My machine is an 8-core Gen1 Ryzen 7, 32 GiB of DDR4, (I think) 4 GiB of VRAM on the graphics.

1 more reply

j / k navigate · click thread line to collapse