Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security. It was possible for us to invest in this, but it would require additional investment in hiring or training to get us to a state comparable to the hosted options.
Eventually, I had to recommend against the project as it was more likely to be an investment in the leading team's resume, than an actual investment into our organization.
Your last paragraph hints at retention struggles which complicates the issue.
But was vendor mitigation not part of the evaluation? I get that most companies view governance and compliance as a pay to play issue, but there has always been an issue with rapidly changing areas and single source suppliers.
I admit to having my own preferences and being almost completely ignorant about what your needs are, but I have seen the value in having a rabbit to pull out of the hat.
If employee retention doesn’t allow for departure of individuals without complete loss of institutional knowledge I guess my position wouldn’t hold.
But during the rise of cloud computing I introduced an openstack install in our sandbox, not because I thought that we would stay on a private cloud but because it allowed our team to pull back the covers and understand what our cloud vendor was doing.
It was an adoption accelerator that enabled us to choose a vendor that was appropriate and to avoid the long tail of implementation.
I was valuable as a pivot when AMD killed seamicro with short notice, and the full cloud migration period was dramatically shortened.
I have a dozen other examples, but it is like stock options, volatility and uncertainty dramatically increase the value of keeping your options open.
We will have vendors fold, and a single source only story couples you org to the success of that vendor.
IMHO There is a huge difference between tying your success to an Oracle, who may be ‘safe’ if expensive as a captive customer and doing the same in uncertain markets.
Would you be willing (or able) to share more?
better to take the risk for most things. if the worst case happens and you have to migrate, you migrate. otherwise you risk overengineering upfront and guaranteeing reduced productivity rather than risking it
The point is not avoiding vendors or duplicating everything. The point is designing systems so the software/platform never becomes the point of control.
A self-hosted, minimal sandbox instance using simple containers and tools is one way to help avoid that lock-in trap.
It is not zero cost, but strategically important to make sure that vendors don't shape your enterprise, but support it.
IMHO Systems should be designed to be as replaceable as possible, without adding the extreme complexity that a true 'multi-cloud' solution would offer as an example.
The point being is that the vendor and/or platform can be replaced anytime the business changes its goals, market shifts, strategies change ...
Keeping the door open and trying to minimize the migration cost is my point, not boiling the ocean.
Repurposing a decomed server or desktop with a GPU (3090 or RTX PRO 6000 Blackwell not DC class) with linux/podman and llama.cpp will help a team understand without much cost, but that is an ignorant of your situation claim on my part.
We both very much agree that upfront multi-vendor implementations are a very bad idea. It suffers from the same problem IMHO, trying to plan past the planning horizon with aspects you have no control over.
Probably too much nuance to discuss here, but thanks for responding.
That's not local models vs hosted models, that's using the enterprise services from Anthropic. Any local LLM inference engine such as VLLM gives you an OpenAI compatible API with the exact same features as a hosted model.
I'm not sure what your use case is, but I personally found Anthropic's offerings lacking and inferior to open source or custom-built solutions. I have yet to see any "memory" system that's better than markdown files or search, and harnesses for agentic AIs are dime a dozen.
That level of hardware, if the performance was enough is a much smaller investment and gamble.
Either way I understand the decision. Your product isn't in locally hosted LLMs, why fuss. That said I see 1 million plus in external spend I start wondering about the options. Not saying you did the wrong thing, I think you did the right thing but things seem to be changing on the local model front and quite rapidly.
The people who don't really know what they're doing (or don't care) need the full power of the SOTA models, those with experience can provide enough context and instruction to make even small local models work.