undefined | Better HN

0 pointsmirekrusin1y ago0 comments

Comments around that page suggest it's more of a facepalm than anything else.

0 comments

4 comments · 1 top-level

CamperBob21y ago· 3 in thread

x2 speed increase for ggml by optimizing SIMD: https://github.com/ggml-org/llama.cpp/pull/11453

"99% written by DeepSeek-R1" according to the author.

rfoo1y ago

Speaks more about how many low hanging fruits remaining in "NOOOOO I DON'T WANT TO DOWNLOAD 200MiB PYTORCH I'D BETTER REINVENT THE WHEEL"-gang inference stacks.

To be fair torch didn't try very hard optimizing on CPU either.

badsectoracula1y ago

FWIW as someone who "NOOO DOESN'T WANT TO DOWNLOAD 200MB[0] PYTORCH"s i'm glad for those who make alternative minimal/no-dependency stacks that are based on C/C++, like ggml.

[0] 200MB is actually a very generous number, i tried to download some AI thing via pip3 the other day and it wanted 600MB or so of CUDA stuff. Meanwhile i do not even have an Nvidia GPU.

1 more reply

wrsh071y ago

But as a response to the parent saying "LLMs will be great at ts/js slop but not for infra" it's quite reasonable to say: here's an example of someone applying it to backend optimizations today.

Fwiw, there are always many attempts at optimizing code (assembly etc). This is good! Great to try new techniques. However, you get what you constrain. So I've seen optimized code that drops checks that the compiler authors say are required in the standard. So, if you don't explicitly tell your optimizer "this is a case I care about, this is the desired output" it will ignore that case.

Did we find a faster implementation than the compiler creates? Well, I mean, sure, if you don't know why the compiler is doing what is doing

j / k navigate · click thread line to collapse