undefined | Better HN

0 pointsh14h25d ago0 comments

IMO it's only a matter of time before "self-hosting local AI" is as complicated as installing an app and clicking a download button.

And when that happens, the pitch to non-techy users is "Free ChatGPT you can use offline with zero privacy risk". Once hardware accessibility and LLM efficiency advance to the point that this becomes feasible, I suspect it'll result in a much bigger hit to the cloud AI market than many expect.

0 comments

8 comments · 2 top-level

adamrezich25d ago· 4 in thread

Why is it only a matter of time? The AI-as-a-service companies are going to continue to improve their products by improving both the part that could be reproduced in a self-hosted setup, but also the “secret sauce” they put on top of that to make it a better product. There is no incentive for this “secret sauce” to be something that can be reproduced for self-hosting, is there?

thewebguyd25d ago

What secret sauce? We already have open source tooling for tool use, web browsing, and code execution/computer use. Open weight models will win in the end.

AIaaS might keep an edge with multi-modal agentic workflows, but for 80% of general use cases, no "secret sauce" needed, the open weight models are already there, and tooling is constantly getting better.

The bottleneck is the cost of local hardware right now.

Shitty-kitty25d ago

The "secret sauce" is vendor lock-in. A textbook case is the vmware broadcom situation. Vmware was cheap so corporations found little reason to use open source. Broadcom made vmware expensive but now those corporations are finding out that it is a lot of work (aka expensive) to switch infrastructure.

1 more reply

h14hOP25d ago

I think a major incentive could be to sell hardware. If Apple is able to get their hands on a local LLM capable of covering a significant % of what people use ChatGPT for, the pitch they can offer is:

"Free, private, offline ChatGPT so long as your laptop has X GB of RAM"

Beyond that, I wouldn't underestimate the incentive of "because I can". The "secret sauce" you refer to is effectively just a DB & a while loop that feeds text to a bunch of tensors. If an indie dev decides they want to release something that dismantles the OpenAI & Anthropic moats, there really isn't all that big of a technical barrier stopping them.

bigyabai25d ago

LLM inference decode is heavily dependent on memory speed, not just having lots of memory. You can't say "X amount of ram" because the memory bandwidth on an M1 is 68.3 GB/s versus the 614 GB/s of an M5 Max, or a 4090's 1.01 TB/s over GDDR6X.

This basically creates a bottleneck at the oldest/cheapest Apple Silicon machines, which are already crippled for context prefill.

1 more reply

ribosometronome25d ago· 2 in thread

That workflow has been around for awhile now. I'm sure there are others but LM Studio has a model browser in app that effectively simplifies things to hitting download and hitting launch. The complexity tends to be in that there's a lot of models to choose from and also knowing how to set up whatever tool you're using with a local model. None of it's particularly hard, unless you start trying to customize settings.

I think the bigger hang up is that they're still slower and less capable than the frontier models, especially at the hardware specs most home users are likely to have.

h14hOP24d ago

The performance hangup is definitely a barrier, but I think LM Studio and other similar apps are still too far on the "techy" end of the spectrum and have UX barriers that will need to be addressed. IMO for most people, exposing things even as "basic" as the official model name is a leaky abstraction that could be overwhelming.

If the first thing (for example) my mom sees upon installing the app is a dropdown model picker that contains things like "Qwen3.6-35b-a3b-mlx" she will 100% be bouncing off of it.

IMO the best version of this is a custom app/harness with a couple of pre-selected (and ideally fine-tuned) open models that immediately start downloading after checking the system's hardware specs. This would likely be a turn-off to most devs, but is absolutely essential if building an app for general consumers.

selicos24d ago

LM Studio Link is brilliant, outside their central login/auth requirement. Tailscale is the backbone, I think, so it makes sense but I'm sure a method with wireguard could exist and enable similar performance.

the current dielmma for me is how do I install a model on a remote LM Studio device without bypassing Lm Studio to SSH or remote in?

> lms link [servername] get model ?

> lms get [servername] model ?

> lms get model --link [servername] ?

Maybe I need to read the docs again but I swear the only way is remote or go to that device and download via the GUI, ssh in and use the local cli.

Maybe can copy/paste from one device's downloads dir to the server? Maybe I need to try hosting models on my NAS and see if I can download from device 1 then run on device 2 without install/setup?

j / k navigate · click thread line to collapse

0 comments

8 comments · 2 top-level

adamrezich25d ago· 4 in thread

thewebguyd25d ago

What secret sauce? We already have open source tooling for tool use, web browsing, and code execution/computer use. Open weight models will win in the end.

The bottleneck is the cost of local hardware right now.

Shitty-kitty25d ago

1 more reply

h14hOP25d ago

"Free, private, offline ChatGPT so long as your laptop has X GB of RAM"

bigyabai25d ago

This basically creates a bottleneck at the oldest/cheapest Apple Silicon machines, which are already crippled for context prefill.

1 more reply

ribosometronome25d ago· 2 in thread

I think the bigger hang up is that they're still slower and less capable than the frontier models, especially at the hardware specs most home users are likely to have.

h14hOP24d ago

If the first thing (for example) my mom sees upon installing the app is a dropdown model picker that contains things like "Qwen3.6-35b-a3b-mlx" she will 100% be bouncing off of it.

selicos24d ago

the current dielmma for me is how do I install a model on a remote LM Studio device without bypassing Lm Studio to SSH or remote in?

> lms link [servername] get model ?

> lms get [servername] model ?

> lms get model --link [servername] ?

Maybe I need to read the docs again but I swear the only way is remote or go to that device and download via the GUI, ssh in and use the local cli.

Maybe can copy/paste from one device's downloads dir to the server? Maybe I need to try hosting models on my NAS and see if I can download from device 1 then run on device 2 without install/setup?

j / k navigate · click thread line to collapse