undefined | Better HN

0 pointscoreyp_113d ago0 comments

With 128 GB strix halo, you can't do as big of a model as you would think. You can do larger than having a single graphics card, of course, but that 128 gigs cannot all be dedicated to the model. Remember, the context alone is usually larger than the model itself. I got an EVO X2, and I don't regret it, but by my current calculations, it will take 8 years to recoup the cost, as opposed to just using equivalent, paid commercial options.

0 comments

2 comments · 2 top-level

smilekzs13d ago

A key consideration in favor of running your local LLM despite all the trouble: The commercial serving endpoint may not exist tomorrow, or at least not at the same price.

hedora13d ago

My current rule of thumb is 1GB gets you 1B parameters with a big context. (Qwen 32B fits in 32GB with 200K+ contexts)

That’s with heavy compression of the weights and the context, of course.

I haven’t gone through model evaluation + shoehorning at 128GiB yet.

j / k navigate · click thread line to collapse