As a developer the amount of effort I'm likely to spend on the infra side of getting the model onto the user's computer and getting it running is now FAR FAR below the amount of time I'll spend developing the app itself or getting together a dataset to tune the model I want etc. Inference is solved enough. "getting the correct small enough model" is something that I would spend the day or two thinking about/testing when building something regardless. It's not hard to check how much VRAM someone has and get the right model, the decision tree for that will have like 4 branches. It's just so little effort compared to everything else you're going to have to do to deliver something of value to someone. Especially in the set of users that have a good reason to run locally.
No comments yet.