undefined | Better HN

0 pointszozbot23420d ago0 comments

The DS4 author has demoed upcoming work on Strix Halo that makes it roughly competitive with the Apple Silicon equivalent (i.e. Pro models with similar memory bandwidth figures, not Max or Ultra). Maybe even a bit faster for prefill, and with further potential for running small batches in parallel (since the GPU clearly has some amount of compute headroom during decode).

0 comments

4 comments · 1 top-level

SwellJoe19d ago· 3 in thread

As far as I can tell you'll have a context limit of about 64k, which is also prohibitive for serious work. (My benchmark maxes out at 90k in context when running, so I'm giving the self-hosted models 128k to leave plenty of wiggle room.)

But, still, it's cool that the work is happening. For some classes of problem it might be an option, and when the 192GB Strix Halo comes out, DS4 will probably become a real contender for self-hosting champ, as that leaves enough memory for a big context.

zozbot234OP19d ago

> As far as I can tell you'll have a context limit of about 64k

Source? The author has demoed a 100k ctx already, and I can't think of a reason why more wouldn't be supported. RAM is a bit tight but that only matters with really long contexts on DeepSeek V4, and proper support for SSD streaming would address this anyway.

BTW, the official support is now merged too.

SwellJoe19d ago

OK, I just tried it with the new mainline ROCm and MTP support, and it is faster, but still uncomfortably slow for interactive coding agent use. It does about 14-15 t/s, which is faster than the 10-11 t/s I was seeing before, but still a crawl. I set it loose on a small 300-line Perl file, and it's still chewing several minutes later.

So, it's super cool that such a solid model can run locally and it's probably useful for batched work overnight. But, I'm not going to sit around twiddling my thumbs while working. I think I can write code by hand faster than this. I'll gladly pay for a cloud model so I don't have to wait (especially since DeepSeek models are so cheap).

1 more reply

SwellJoe19d ago

No source, just back of the envelope math. 100k seems optimistic, but I guess I'll try it and see. That would be usable for at least a few use cases, including the security scanning work I'm focused on at the moment (at least, so far, the peak token usage has been 90k, which would make 100k tight but probably fine).

j / k navigate · click thread line to collapse

0 comments

4 comments · 1 top-level

SwellJoe19d ago· 3 in thread

zozbot234OP19d ago

> As far as I can tell you'll have a context limit of about 64k

BTW, the official support is now merged too.

SwellJoe19d ago

1 more reply

SwellJoe19d ago

j / k navigate · click thread line to collapse