undefined | Better HN

0 points2ndorderthought11d ago0 comments

Have you guys considered running your own local models? 200k a month is a ton of money and puts all your eggs in one basket. Or is it easier to just be able to run away from it all if you are done with it or something changes?

0 comments

SimianSci11d ago

I led the team that did the math and analysis for determining our direction in selecting Anthropic. We initially assumed this was where we would end up, but after some investment exploring our options we found it not worth the trouble.

Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security. It was possible for us to invest in this, but it would require additional investment in hiring or training to get us to a state comparable to the hosted options.

Eventually, I had to recommend against the project as it was more likely to be an investment in the leading team's resume, than an actual investment into our organization.

nyrikki11d ago

To start, I want to be clear I am trying to understand not criticizing, and mistakes are how institutional knowledge grows.

Your last paragraph hints at retention struggles which complicates the issue.

But was vendor mitigation not part of the evaluation? I get that most companies view governance and compliance as a pay to play issue, but there has always been an issue with rapidly changing areas and single source suppliers.

I admit to having my own preferences and being almost completely ignorant about what your needs are, but I have seen the value in having a rabbit to pull out of the hat.

If employee retention doesn’t allow for departure of individuals without complete loss of institutional knowledge I guess my position wouldn’t hold.

But during the rise of cloud computing I introduced an openstack install in our sandbox, not because I thought that we would stay on a private cloud but because it allowed our team to pull back the covers and understand what our cloud vendor was doing.

It was an adoption accelerator that enabled us to choose a vendor that was appropriate and to avoid the long tail of implementation.

I was valuable as a pivot when AMD killed seamicro with short notice, and the full cloud migration period was dramatically shortened.

I have a dozen other examples, but it is like stock options, volatility and uncertainty dramatically increase the value of keeping your options open.

We will have vendors fold, and a single source only story couples you org to the success of that vendor.

IMHO There is a huge difference between tying your success to an Oracle, who may be ‘safe’ if expensive as a captive customer and doing the same in uncertain markets.

Would you be willing (or able) to share more?

willsmith7211d ago

it's an SMB, if you need redundancy on every 3rd party dependency your business will die anyway

better to take the risk for most things. if the worst case happens and you have to migrate, you migrate. otherwise you risk overengineering upfront and guaranteeing reduced productivity rather than risking it

nyrikki10d ago

We are probably closer than you think, and SMBs have zero leverage.

The point is not avoiding vendors or duplicating everything. The point is designing systems so the software/platform never becomes the point of control.

A self-hosted, minimal sandbox instance using simple containers and tools is one way to help avoid that lock-in trap.

It is not zero cost, but strategically important to make sure that vendors don't shape your enterprise, but support it.

IMHO Systems should be designed to be as replaceable as possible, without adding the extreme complexity that a true 'multi-cloud' solution would offer as an example.

The point being is that the vendor and/or platform can be replaced anytime the business changes its goals, market shifts, strategies change ...

Keeping the door open and trying to minimize the migration cost is my point, not boiling the ocean.

Repurposing a decomed server or desktop with a GPU (3090 or RTX PRO 6000 Blackwell not DC class) with linux/podman and llama.cpp will help a team understand without much cost, but that is an ignorant of your situation claim on my part.

We both very much agree that upfront multi-vendor implementations are a very bad idea. It suffers from the same problem IMHO, trying to plan past the planning horizon with aspects you have no control over.

Probably too much nuance to discuss here, but thanks for responding.

joefourier11d ago

> Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security.

That's not local models vs hosted models, that's using the enterprise services from Anthropic. Any local LLM inference engine such as VLLM gives you an OpenAI compatible API with the exact same features as a hosted model.

I'm not sure what your use case is, but I personally found Anthropic's offerings lacking and inferior to open source or custom-built solutions. I have yet to see any "memory" system that's better than markdown files or search, and harnesses for agentic AIs are dime a dozen.

2ndorderthoughtOP11d ago

I don't blame you. I personally would consider revisiting it in the next month or so. A lot of people are saying some of these smaller models like qwen 3.6 are basically at Claude sonnet performance if not better.

That level of hardware, if the performance was enough is a much smaller investment and gamble.

Either way I understand the decision. Your product isn't in locally hosted LLMs, why fuss. That said I see 1 million plus in external spend I start wondering about the options. Not saying you did the wrong thing, I think you did the right thing but things seem to be changing on the local model front and quite rapidly.

1 more reply

throwaway31415511d ago

Local models perform objectively worse than SotA SaaS models. Your employees will hate this decision.

2ndorderthoughtOP11d ago

Some of the local models are effectively there. It depends on what scale you need or want. Kimi 2.6 is up there with opus, granted it's huge. On some benches it's actually better. Qwen3.6 is up there with sonnet but it's nearly microscopic. A lot has changed in the last month

slopinthebag11d ago

Only if you're vibe coding, with ambiguous prompts that require the model to fill in a huge number of gaps and basically write the software for you.

The people who don't really know what they're doing (or don't care) need the full power of the SOTA models, those with experience can provide enough context and instruction to make even small local models work.

2ndorderthoughtOP11d ago

Some of the latest batch are more vibe code friendly even. It's pretty crazy. People are few shotting small toy games and stuff with qwen3.6. I'm personally not into that work flow but yea. It won't be long until the efficiency wave hits and small models are really all people need

j / k navigate · click thread line to collapse