undefined | Better HN

0 pointsbrucethemoose22y ago0 comments

Desktop users are not going to be happy about a surprise local model running in the background, even a small quantized one.

0 comments

chatmasta2y ago

So don't make it a surprise then. Besides, I'd be much less happy about my post-authenticated content being sent to a surprise cloud model...

rockemsockem2y ago

I'm curious, why do you think that?

chankstein382y ago

Because it seems like, regardless of the announcement, there will always be someone who has the most niche issue with it and manages to make assertions for an entire group of people while only really referencing their personal experience ("and all of the people they know").

brucethemoose2OP2y ago

I mean, I am the strongest local LLM advocate you will find. I have my GPU loaded with a model pretty much all day, for recreation and work. My job, my livelihood involves running local LLMs.

But it's intense, even with a very finicky, efficient runtime on a strong desktop. Local LLM hosting is not something you want to impose on users unless they are acutely aware of it, or unless its a full stack hardware/software platform (like the Google Pixel) where the vendor can "hide" the undesirable effects on system performance.

I think that's a reasonable generalization to make.

chankstein382y ago

Fair but google does, _supposedly_ have a Gemini model meant to run on phones so it'd presumably be small enough that it wouldn't necessarily be a massive problem. Or, at least, we could get there eventually. Not arguing at this point, you're right. I just think over time we could get there

brucethemoose2OP2y ago

Running "smart" LLMs locally takes a lot of RAM, a lot of compute, and a lot of disk space.

It produces a considerable amount of heat unless it's run on an NPU, which basically doesn't happen on desktops at the moment.

Hot loading/unloading it can be slow even on an SSD.

Users often multitask with chrome in the background, and I think many would be very displeased to find Chrome bogging down their computer for reasons they may not be aware of.

Theoretically Google could run a very small (less than 2B?) LLM with very fast quantization, and maybe even work out how to use desktop NPUs, but that would be one heck of an engineering feat to deploy on the scale of Chrome.

rockemsockem2y ago

Honestly that sounds extremely feasible, especially for a feature that isn't on by default. The one the parent comment references in Arc isn't on by default. Also chrome eating up system resources is already a meme and they've been working on using less by sleeping tabs.

j / k navigate · click thread line to collapse

0 comments

chatmasta2y ago

So don't make it a surprise then. Besides, I'd be much less happy about my post-authenticated content being sent to a surprise cloud model...

rockemsockem2y ago

I'm curious, why do you think that?

chankstein382y ago

brucethemoose2OP2y ago

I mean, I am the strongest local LLM advocate you will find. I have my GPU loaded with a model pretty much all day, for recreation and work. My job, my livelihood involves running local LLMs.

I think that's a reasonable generalization to make.

chankstein382y ago

brucethemoose2OP2y ago

Running "smart" LLMs locally takes a lot of RAM, a lot of compute, and a lot of disk space.

It produces a considerable amount of heat unless it's run on an NPU, which basically doesn't happen on desktops at the moment.

Hot loading/unloading it can be slow even on an SSD.

Users often multitask with chrome in the background, and I think many would be very displeased to find Chrome bogging down their computer for reasons they may not be aware of.

rockemsockem2y ago

j / k navigate · click thread line to collapse