undefined | Better HN

0 pointszkmon1d ago0 comments

I have high respect for unsloth's work, helping millions to get started with local AI, but this post appears kind of download bait.

Offloading too many layers to CPU is not going to work at all. I have tried this many times and had to rm -rf on those heavy hf cache folders. Also I doubt 1-bit or 2-bit quants of GLM 5.2, running mostly outside of VRAM can beat Q8_0 of Qwen3.6-27B fully loaded in VRAM - on usefulness.

0 comments

2 comments · 1 top-level

iaw1d ago· 1 in thread

I run 3bit GLM5.2 and full precision Qwen3.6-27B. GLM is much much closer to frontier models in it's breadth and ability to plan. If you just need to implement Python code from an existing plan Qwen is your choice but it has problem succeeding with more complex tasks that GLM5.2 does not.

As I type this my local GLM5.2 is troubleshooting bugs that Qwen would not be able to handle.

zkmonOP21h ago

Not sure how much of your GLM is offloaded to CPU. I was contending the suggestion of using system RAM + VRAM.

j / k navigate · click thread line to collapse