There is minimal downside to switching to open models (opens in new tab)

(marble.onl)

398 pointsamarble6d ago316 comments

316 comments

158 comments · 36 top-level

coffinbirth5d ago· 43 in thread

> Open models are served via various means, some by the companies that released them and some by third parties like OpenRouter. Unfortunately, both of these routes are dodgier in terms of privacy and data sharing, and I would not feel the same comfort sending API calls containing client or confidential data to them.

That's why I'm using eurouter.ai with the following routing rule for all my requests:

  {
    "model": "glm-5.2",
    "models": [
      "deepseek-v4-pro",
      "deepseek-v4-flash"
    ],
    "provider": {
      "allow_fallbacks": true,
      "data_collection": "deny",
      "data_residency": "EU",
      "max_retention_days": 0,
      "eu_owned": true
    }
  }

Sure, it's quite expensive, but at least on a legal side data privacy is ensured. I trust them more than e.g. Anthropic, OpenAI or OpenRouter.

Personally, I find it morally unacceptable to use U.S. AI tools, because I do not want to support them financially and thus support the crimes they are involved in[1].

[1]: https://news.ycombinator.com/item?id=48512339

himata41135d ago

The part that gets me about anthropic red lines is "of Americans", okay so the rest of the civilized world is up for grabs then? It's okay to destabalize allies with sabotaged tests (in machine learning) and data exfiltration outside America?

What gets me the most is that they claim that the model should follow the https://www.anthropic.com/constitution and they claim that it's embedded into the model. However, system prompts in claude code and cowork re-iterate all of these points and if they're embedded you shouldn't need to do that. Now, if you ask the API version of claude to be a hitler supporter with enough prompt engineering it will become one which directly contradicts what they claim to do, opus 4.7 specifically will be happy to create anti-(insert minority group) propaganda although I haven't had the same success with 4.8 thus far, but I also haven't been motivated enough to push it in that direction yet since I've been more interested in exploting the cyber capabilities of the model.

My conclusion from the very start is that Anthropic's strategy are pure optics and considering the fact that there was an outpoor of support for the company I think it has been very successful.

dminik5d ago

Yeah, it was funny seeing a bunch of people going like "Anthropic is fighting for privacy" meanwhile I'm like "Uhh, what about the other 8 billion people?"

On second thought, it's not funny.

beng-nl5d ago

As a thought experiment - such shocks (govt pressure to use models for bad purposes and govt excluding access to non-Americans) coming early in the ‘ai revolution’ will wake up the rest of the world sooner that they have to get their act together to stay competitive without relying on USA. Just like with nato.

nerdsniper5d ago

> The part that gets me about anthropic red lines is "of Americans", okay so the rest of the civilized world is up for grabs then? It's okay to destabalize allies with sabotaged tests (in machine learning) and data exfiltration outside America?

Regardless of Anthropic's "moral" position (inasmuch as a corporation can even have morals) against spying on non-Americans, they would have no way to enforce that limitation against the government because non-citizens outside of the USA have no protections from the intrusions of the US government.

cbolton5d ago

They can include these limitations in a contract which can be enforced like any contract.

2 more replies

oefrha5d ago

> anthropic red lines

Alleged red lines. Could be just talking points for garnering sympathy. Big tech aren’t exactly known for being truthful, especially big tech partnering with esteemed Palantir.

1 more reply

throwaw125d ago

> The part that gets me about anthropic red lines is "of Americans", okay so the rest of the civilized world is up for grabs then?

And this is coming from a CEO who constantly claims moral superiority and advances the idea that China is bad

avadodin5d ago

These companies are so good at selling their product's likely incompetence as possibly intentional subversion.

johndough5d ago

I had a look at eurouter.ai and it seems like an extremely bad offer.

- The prices are ridiculous (15 % markup for free account).

- They have a rate limit of 1000 requests per month, unless you pay 40€ per month for ... what exactly is their value proposition?

- They have a single provider (TensorX) for DeepSeek-V4-Pro, with a cache read cost that is over 100 times higher than DeepSeek ($0.44 vs $0.003625). Notably, I had to look at the TensorX website for that information, since I could not find any information about cached token cost on eurorouter.ai.

qznc5d ago

I guess the prices are for "EU owned" instead of "EU hosted". The data centers in the EU where you can rent GPUs is mostly US companies.

trollbridge5d ago

It looks like a business opportunity, then, to provide inference that is EU-local and/or EU-owned.

If there aren't enough businesses who want to do this, the EU should figure out how it can properly incentivise that to change.

imhoguy5d ago

Hosting anything in EU must cover redtape and carbon taxes in electricity bill.

Grombobulous5d ago

That seems pretty unsubstantiated. Hetzner proves that EU data center != expensive.

Low carbon does not equal expensive, either. Solar is the cheapest power generation method. Solar plus grid scale batteries is in the same cost ballpark as natural gas.

There’s nothing about data centers that is inherently a high carbon business. It’s only a high carbon business in places like the US where political leadership purposefully fights against renewable energy projects that private businesses want to undertake on their own dime.

jampekka5d ago

The markup is not going to the providers, only the router. It seems more like eurouter found a niche it can milk for a while.

KronisLV5d ago

Actually got curious about other alternatives to OpenRouter and looked into it a bit.

EURouter (Amsterdam): https://www.eurouter.ai/pricing

Eden AI (France): https://www.edenai.co/pricing

nexos.ai (Lithuania): https://nexos.ai/pricing/

Requesty (Germany): https://www.requesty.ai/pricing

Cortecs (Austria): https://cortecs.ai/pricing

Nordference (Estonia): https://nordference.ai/pricing

Guess those are really popping up as mushrooms, eh? Not an endorsement of any of those on my part cause I haven't personally used them, but seems like there are at least options for those who need them.

root-parent5d ago

Crimes does not even starts to describe it:

"AI-assisted targeting in the Gaza Strip" - https://en.wikipedia.org/wiki/AI-assisted_targeting_in_the_G...

"Palantir allegedly enables Israel's AI targeting in Gaza, raising concerns over war crimes" - https://www.business-humanrights.org/de/neuste-meldungen/pal...

"What The Wounds Are Telling Us" - https://www.volkskrant.nl/kijkverder/v/2025/gunshot-palestin...

bandrami5d ago

If data security is an actual concern I don't think there's a solution other than biting the bullet and self-hosting.

fg1375d ago

If your only concern is data residency, data privacy and sharing, why not just use bedrock with the processing region locked to eu-west-2? For sure, it's not an European company serving the LLM, but it satisfies your requirements otherwise and is trusted by tons of companies worldwide.

helloplanets5d ago

Anthropic already explicitly communicated that they'll store and check all the data from Bedrock or any platform, even if you've selected zero data retention, if using Mythos class models. To use these models on any platform, you'll have to accept these terms regardless of the region.

> Limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered.

> Change applies to organizations that have set up workspaces with zero data retention (ZDR) in Claude Console, use Claude Code with ZDR in Claude Enterprise, or access Claude through AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry with ZDR.

https://support.claude.com/en/articles/15425996-data-retenti...

fg1375d ago

The original comment is about GLM/deepseek models. As you already pointed out, this applies if you use those specific Claude models on any platform, so I don't know what the point is.

quikoa5d ago

> it's not an European company serving the LLM

That's a pretty big downside if data privacy and sharing is one of the main concerns.

fg1375d ago

I'd like to see some real reasoning here that is based on facts.

1 more reply

trollbridge5d ago

The great part about open models is that you can do this.

Do you have a sound reason to need EU data locality? You can.

Do you want the confidence (and are willing to accept the expense) of only running models on local hardware you control? You can.

Do you want the cheapest possible option - choosing a Chinese, for example, provider, or perhaps a provider offering it for free where you agree they can use your prompts? You can.

Do you need to comply with some kind of regulation like GDPR or rules for contracting with the U.S. federal government? No problem. (Although I'm still waiting for DeepSeek V4 to show up on Amazon BedRock so it can be used from GovCloud...)

Do you have moral objections and want to actually live by them? You can.

ttoinou5d ago

You dont care about which exact provider it is using behind the hood ?

Phlogi5d ago

No, as long as they follow the requirements, especially the data privacy agreements. What would you? Price?

fredoliveira5d ago

Output quality immediately comes to mind, of course.

Models are converging, but they converge in bands, and frontier is frontier. I would not like to have any workflows in any area of my business where output is generated by an assortment of models from different providers. For trivial, mundane tasks that might be fine, but it certainly doesn't apply across the board.

1 more reply

ttoinou5d ago

How do you know they're following requirements ? At least a quick search about the company providing the service would be useful

1 more reply

codedokode5d ago

Not only it requires a minimum payment of 39 euro, it doesn't accept cryptocurrency althogh that can be worked around by buying a prepaid virtual card for crypto.

yogorenapan5d ago

What services give you a prepaid virtual card for crypto without KYC?

WhyNotHugo5d ago

You only need to worry about GDPR and the hoster being in the EU if you're giving the model actual access to production data — which you shouldn't anyway. Use the model to write code that processes or analyses the data, so that process can easily be reproduced with deterministic results.

simianwords5d ago

GDPR compliant llm was a joke a few months back but here we are

speedgoose5d ago

I work in Europe, sometimes with sensitive data, and LLMs weren’t an exception a few months ago.

Maybe it was funny to you, but designing data platforms that respect GDPR and involve LLMs is a thing.

throw12345678915d ago

But is no joke anymore.

vonneumannstan5d ago

Lol what performative shite. Chinese astroturfing 101. You're either mentally ill or a shill.

throwaway274485d ago

Why use EU specifically? I get not trusting the US, of course, but surely the EU isn't far behind in its desire to spy on its own citizens. Do you not live there?

earthnail5d ago

From all the large governmental institutions, the EU is the one currently holding up traditional western values. That gives it street cred in this subject.

kortilla5d ago

>traditional western values

This seems tautological because Europe is pretty weak on the values that people in the US might care about (freedom of speech, limited govt, etc).

What values specifically are you optimizing for here?

2 more replies

throwaway274485d ago

I'm honestly not really sure what "traditional western values" have to do with where to store data. What does that even refer to—individualism? Christianity? Representation in court by lawyers? How does this intersect with the topic at hand?

Edit: c'mon people, if you're going to use such ambiguous phrases at least have the spine to clue the reader in to what you want them to refer to in this context.

1 more reply

hdgvhicv5d ago

https://www.theguardian.com/us-news/ng-interactive/2026/feb/...

The age old joke;

A Russian and an American are drinking at a bar

The Russian says "I'm impressed by american propaganda. It's so subtle but effective."

The american responds "What are you talking about, we don't do propaganda."

2 more replies

cpursley5d ago

With all the issues in the US and generally wrong direction, I can’t remember them ever arresting people for mean tweets in the way that Germany and the UK have. They all seem to be running full speed towards a surveillance state.

8 more replies

0xDEAFBEAD5d ago

"The situation for free speech in Europe is even worse than I thought"

https://eternallyradicalidea.com/p/the-situation-for-free-sp...

1 more reply

Phlogi5d ago

US Data Privacy is not sufficient.

throwaway274485d ago

For what? Does the EU not want to spy on its citizens? That strikes me as... unlikely.

Why not host in east asia? Or southeast asia? Or south america? Or africa? Then you avoid both the government with incentive to spy on you (assuming you live in the EU) and american companies.

3 more replies

julianlam6d ago· 32 in thread

I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

I know LLMs move at the speed of light (especially these past few quarters), but if Opus and GPT "a few months ago" were really like open weight models, then there's really no reason to not switch, especially for those who were using these models a few months ago.

Your codebase didn't change, so use the open weight model. Don't move the goalposts.

kgeist5d ago

Every new proprietary model is "groundbreaking" and "look, it just solved task X that no other model could solve," only to be referred to as "that crappy previous-generation model" a month later.

So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.

jbverschoor5d ago

Not only that, but to me it seems that after a week the intelligence is being downscaled or routed. Maybe because of lack of capacity

matheusmoreira5d ago

There's at least the possibility that they intentionally degrade the models as time passes. We can't really verify that we're getting what we're paying for all of the time. All the more reason to invest in local inference.

inigyou5d ago

What if the new model is exactly as good as the last model on launch day but better than the last model was on the new model's launch day because it was degraded? Every single time?

3 more replies

manyatoms5d ago

Unless what you're getting is really explicitly spelled out in a contract, you should flatly assume that they're doing whatever they like whenever they like.

1 more reply

LPisGood5d ago

People talk about this a lot. What I have never seen is a discussion of methods they might employ to degrade the models.

Let’s say I’m a bad faith LLM operator, and I want to degrade my model so the next release looks better and people want to switch to the more expensive one. How would I do that?

3 more replies

taytus5d ago

At current prices, and considering these OS Models' performance, investing in local inference sounds like a bad idea.

2 more replies

trollbridge5d ago

There are open models with groundbreaking innovations, like MiMo-2.5-Pro-UltraSpeed which you simply can't get anywhere else (there is no other model with those capabilities that I can get with 1000 token/second speed).

realusername5d ago

There's also a lot of benchmark trickery going on, it's becoming harder to see how the latest models really improved.

The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.

bonesss5d ago

I’m an LLM fan, but from an engineering perspective the idea of building atop services that palpably fluctuate in capacity, performance, and capability is nutty.

Even with minor automation I feel like I can watch OpenAI and Anthropic engineers fiddling in real-time. Tuesdays behaviour changes by Thursday, 10AMs production isn’t possible at 11:30AM. Nutty.

2 more replies

intothemild5d ago

Since I started running my own inference server, I've had zero degradation that I didn't do myself. Basically the only time I see it get worse is if I drop one of the quants.

Which is what I suspect the providers are doing to fit more inference on the same amount of hardware over time.

Barbing5d ago

Interesting, Claude might be doing better since I last checked:

https://marginlab.ai/trackers/claude-code-historical-perform...

There were at least a couple of these degradation trackers.

fsuts5d ago

Agreed

4fffs5d ago

Correct. Anything else is pure marketing and you have fallen for it.

Aurornis5d ago

> I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models

I experiment a lot with the open models and I’m getting tired of this trope. I’m not yet convinced that even the best open weight models are equal to Opus from “a few months” ago.

I know what the benchmarks say. I had higher hopes. My real experience just doesn’t match the benchmarks.

I also do a lot of work that even Opus 4.8 struggles with. When even the cutting edge LLMs aren’t all the way there yet, my motivation to switch to something even further behind just isn’t there.

CamperBob25d ago

Have you found anything specific that the full-precision quant of GLM 5.2 can't do that Opus 4.8 can? I haven't, so far.

5.2 lives up to the hype. I don't find it to be the best at anything except coding. But for coding... yeah, it lives up to the hype. Not quite Opus 4.8-level, but I would feel comfortable comparing it to 4.5, at least if it had vision capabilities.

iot_devs5d ago

I would love if you could make some examples

OtomotO5d ago

> My real experience just doesn’t match the benchmarks.

That's exactly the problem I have... with Anthropic and "Open""AI"

itwaswatson5d ago

We have a provider with Deepseek V4 flash at our work. It can handle 95% of the "actually functional" workload at a tenth of the cost. I still pull up beefier ones sometimes, but that's after some consideration.

The moat is so flat, it only gives +1 food and +1 production. +1 gold with a road.

calgoo5d ago

Same, i feel that V4 Flash is great at task implementation, but im still looking at bigger models for design. Now, GLM 5.2 with high thinking is actually getting really close now. I have switched for all personal projects right now and am quite happy with the results. I think the magic is in the big context window (1m) + a lot of thinking gets us very close to at least Opus 4.6 level. Im currently running directly on z.ai with a lite coding plan, and have bought API credit on deekseek as well. I will be looking at EU based hosts next and then i might switch over some of the more critical flows.

dwoosley5d ago

The only reason I'm on HN right now reading this post is because the Anthropic's API is down... so there's another point for self hosted.

qznc5d ago

To be a little bit more precise than "a few months behind", what probably matters is before or after "Claude Opus 4.5 from Nov 24, 2025". That was the model which started the OpenClaw hype over Christmas.

taormina5d ago

For that matter, the new models are shit. If I’m using Opus 4.6 anyway to get anything actually done, then great, we’re actually entirely caught up then.

827a5d ago

Intelligence is maybe a few months behind. But cost sadly is further behind. GLM-5.2 has a deceptively high cost during day-to-day usage for e.g. coding because 1) it has to think a ton more than GPT-5.5/Opus-4.8 to get to competitive results; 2) many providers are still figuring out caching; and 3) API pricing for Codex/Claude can be as high as 40x more than subscription pricing, which distorts the market.

Gigachad5d ago

The reason for me is work pays for Github Copilot which doesn't have these open modals.

TacticalCoder5d ago

> I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

The really interesting thing is that it's typically those very same accounts who were explaining, a few months ago, that thanks to their commercial model they were gaining so much time and producing so much fantastic code.

A few months passes and suddenly the open-source model have caught up with the models that were gaining them so much time and that produced amazing code (in production everywhere for sure btw) but... It's impossible to work with these models.

Rinse and repeat.

The current models, according to them, are basically AGI and they can go fishing while paid subscriptions solve the world's problems.

But when it six months there shall be new closed, pricey, models and when the open ones shall have reach the level of Fable, we'll hear how it's impossible to work in late 2026 on a model that is "only at the level of Fable".

These people should have been snake-oil salesmen (and it could be what they actually are).

nemomarx5d ago

My most charitable interpretation that there's some honeymoon effect for each release, and people genuinely feel very productive and useful for 2-3 months. By the time the next big model release happens they've seen some issues or run into something that makes them feel like the new model will fix all that and improve their flow so much, etc.

Not unusual in the tech space, but this has been basically constantly happening for two years now? I can't imagine the improvements are more than incremental at this point.

windexh8er5d ago

They are generally referred to as the Kool-Aid drinkers. There's always something holding them back from open models. It's no different than the argument in the article. I've been daily driving Linux for well over 20 years at this point and while things have gotten easier they haven't gotten that much easier. There's always been a distro that's focused on new users or ease of use. I used to take for granted the Linux distro ecosystem but now worry how Microsoft, Apple and others will continue to try and legislate compute into a corner. I can appreciate good engineering, but when I look at OS X and Windows they're both failing end users in different ways.

Just like the OS ecosystem I think we'll see a similar trajectory with OAI, Anthropic and Google but on a much accelerated time scale. I think the lobbying has begun to lock in their fate for revenue - because none of them give a shit about their users. I do hope, however, that Anthropic continues to over rotate and continue to gimp their models into uselessness. I just asked Opus 4.8 the other day to look at some code as an adversary and summarize areas that should be addressed. Nothing specific and it shut down the conversation. However starting a new prompt and prodding the model from a different angle yielded the results I asked for directly. Pick a lane. Or, don't and continue to lose industry respect and consideration.

tonfreed5d ago

Even just one of the smaller models is good enough for the grunt work I use them for 90% of the time. Currently doing most of my home hobby projects with OpenCode Go and Qwen 3.7 Plus, it's not great at diagnosing issues in the code, but if I can clearly articulate a test suite or boilerplate refactoring it works fine.

moomoo115d ago

ok but your competition using the latest models has an advantage

not all of us are doing noob shit lol

handoflixue5d ago

You're being entirely unreasonable. 640 kilobytes of memory was enough for Bill Gates, and yet somehow your special project needs more?

59nadir5d ago

Heh, if you're using LLMs heavily for work I think odds are pretty good you're doing pretty trivial stuff. It might not be trivial to you, but you're probably just not very good at this.

pkulak5d ago· 11 in thread

Sure. But OpenAI is the same price. Why would I pay $18/month for z.ai when OpenAI is $20/month?

CJefferson5d ago

One big advantage I’ve found — people get attached to models (including me). With open models if you find one that works perfectly for you but the next version doesn’t, you can run the old one forever (or someone will for you)

itake5d ago

But… the models will fall behind. As libraries and languages and tool calling updates or the world knowledge changes, the models decay.

Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.

hypfer5d ago

> But… the models will fall behind.

Yes but why does that matter? If I am happy with its capabilities now, I will continue being happy with its capabilities in the future.

Yes, it cannot do the newest magic shit, but why does that matter? It can still do everything that existed up until that point, which is _a lot_.

Eventually, you might also need something new, but it's not like the world shifts over all problems that exist from <old> to <new> and any tech for <old> problems suddenly becomes obsolete?

1 more reply

OtomotO5d ago

No problem, "AI" will just write its own frameworks and libs then!

taytus5d ago

This is a good point I never thought of. I appreciate it.

0xbadcafebee5d ago

One reason might be request limits. OpenAI's ChatGPT Plus w/Codex ($20/month) provides a worst-case 5-hour-request-limit of 15 for GPT-5.5, 20 for GPT-5.4, 60 for GPT-5.4-Mini. Whereas Z.ai Lite ($18/month) provides a worst-case of ~80 for GLM 5.2 (off-peak; on-peak is 2am-6am New York time). So Z.ai can provide higher limits for a cheaper price. (https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/...)

pbgcp20265d ago

Subscriptions are done. By the end of 2026 everyone will be paying for actual mils of tokens consumed, via API calls.

fulafel5d ago

https://news.ycombinator.com/item?id=48618455

pkulak5d ago

I pay month to month.

notatoad5d ago

the pricing page doesn't seem to call it out anymore, but the claim on z.ai coding plan used to be 3x the usage of the equivalent-price claude plan. whether that's accurate i don't know, but just based on api pricing GLM is way cheaper.

flexagoon5d ago

OpenCode Go is $10/month and the limits are much more generous than those or Codex

causality05d ago· 7 in thread

I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".

sleepyeldrazi5d ago

That's why I like qwen3.6 27B, it has 0 ego, it knows that it doesn't have complete world knowledge, so when it sees a web_search tool it searches all the time. Even qwen3.5 9B is mostly search-eager (but given the size, it's weaker on reasoning on the results if that's needed). I use a stock pi harness with only web_search and web_fetch (cleans up the html to only keep text) tools defined.

I have given up on making Opus actually retrieve online information for me. At this point I only query it side by side with qwen to laugh at how it didn't even attempt to search properly, and how a small local model is beating it every time. Gemini is very fast for searching, but somehow miss-sources all the time.

wilj5d ago

> I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

The things you describe are just tool calling, they're a feature of whatever harness you use. Use OpenCode, pi.dev, or maki.sh with any of the open models.

> I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".

You can do most of this with some system prompts added to whatever agent you're using. You can do it from the settings on the claude/chatgpt websites too. (minus the no-guardrails thing)

newwttbreak5d ago

What are good resources and forums where I can figure out these system prompts to bypass guardrails, atleast on agents?

JSR_FDED5d ago

Just go to kimi.com and try for yourself (not affiliated, but happy user).

First time I did this I realized in 5 seconds that the big players weren’t going to be carving up the market between them.

linzhangrun5d ago

You can let the AI solve it itself, and then it will provide two solutions: implement a local search service (easily blocked), or purchase a Web Search API service

flexagoon5d ago

There are tons of existing Skills/MCPs for Google/Kagi/whatever search, and making your own is trivial. I gave DeepSeek in Pi a link to Kagi API docs and asked it to add a web search skill, and it did that easily.

tr_user5d ago

isn't that just in the harness?

bnj5d ago· 6 in thread

I’ve been wanting to get better acquainted with local inference but I don’t have the hardware, which has made me think about something I haven’t seen discussed, which is local collaboratives. The economics makes it seem like a group of people joining together to run good hardware and an open model might make sense, but I haven’t seen anything like this mentioned. Have I been missing it?

I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.

Aurornis5d ago

The reason you don't see more of this is because everyone does the math, realizes it's not a good deal, and then gives up on the idea.

There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...

TL;DR: Running GLM 5.2 is going to cost about $20K minimum, and that's going to be painfully slow compared to the cloud hosted versions. Even the estimates where the server is computing tokens 24/7 you can't break even for several years.

The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.

FridgeSeal5d ago

I mean sure, I’d you’re attempting to run the biggest possible models, it’s going to require a stupid amount of compute? I thought we all knew this?

The appeal to me is that we can run that, but we can also run smaller models on your laptop _and it’s functional!_ I can run DeepSeek v4 flash and a qwen 3.6 on my laptop! Thats crazy good.

markerz5d ago

There are plenty of providers of open models that offer very affordable rates. Generally, I recommend looking at OpenRouter since they track various metrics for the various providers.

uberex5d ago

https://news.ycombinator.com/item?id=48524387

blackoil5d ago

Open models hosted in Cloud???

pbgcp20265d ago

AWS Bedrock hosts Gemma 4 31B and this is The Best Deal – hands down. Try it. Vertex also has Gemma 4 MoE version. Not "lobotomised" by quants. There are also GLM (latest) and Qwen / DS (but these two are not latest versions)

DANmode6d ago· 6 in thread

But, what model are you using?

and what hardware are you using?

0gs6d ago

yeah, on a 96GB Mac Studio and Gemma+Qwen, it's definitely fully doable. fully doable but not really for coding on 16GB. but svelter models and cheaper (eventually) hardware are coming!

nezuzen6d ago

"cheaper (eventually) hardware" Best case 2-3 years from now. Otherwise it will take a major global recession to get us anywhere near last year's prices.

marcus_holmes5d ago

Macs are expensive hardware, but I'm always seeing people running LLMs on them. Is anyone running on cheaper generic hardware and Linux?

brucehoult5d ago

A Mac is cheaper than a high end GPU with the same amount of RAM.

2 more replies

Gigachad5d ago

I suspect hosted and local will converge when hardware prices come down and API prices go up. The massive rate of datacenter build out will be unsustainable. Right now the hosted models are massively cheaper than buying the hardware and running it yourself which signals that hosted is very subsidized.

fluidcruft5d ago

If you don't have that hardware thr math of buying a depreciating computer is challenging if you are satisfied with the $100/month plans ($1200/year). A 96GB Mac Studio is ~$4k. I think if you have the hardware already as a sunk cost then yes it makes sense. But I'm not sure it is worth spending $4k for today's hardware vs waiting for newer hardware in a few years.

radhitya5d ago· 3 in thread

Have you read about Opencode Go? They are great provider for open model, like GLM 5.2, Deepseek v4 Pro, Kimi 2.7 Code. You should give it shot to them :-)

2muchtime5d ago

The amount the HN community, at least from what I’ve seen, is sleeping on OpenCode Go (and zen) is kind of amazing.

$10 a month gets you generous usage with the best open weight models and they claim to have zero retention and not to train on your usage.

It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

johndough5d ago

> It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.

The advantage of OpenRouter compared to OpenCode Go is that the price for DeepSeek-V4-Pro and MiMo-V2.5-Pro is better on OpenRouter.

For example, DeepSeek costs $0.435/0.87/0.003625 for 1M in/out/cached tokens (https://openrouter.ai/deepseek/deepseek-v4-pro), compared to an equivalent of $1.74/3.48/0.0145 under the OpenCode Go plan (https://opencode.ai/docs/go/#usage-limits), almost exactly 4x.

But since you get a monthly usage limit of $60 with the OpenCode Go plan for $10 (i.e. 6x), you might still come out ahead if you use it a lot (or use other models, where the pricing difference is smaller or non-existent).

2muchtime5d ago

So the cost makes sense I was unaware but

“The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.”

Opencode Go gives you a choice between “the best” open weight models and you’re not tied down to just GLM or MiniMax and Zen gives you an even longer list of providers including Claude and GPT?

Is it that Openrouter gives you access to like… every model and provider?

linzhangrun5d ago· 3 in thread

Open source models are still not good enough for now, but with the current speed of one new SOTA every two months, by this time next year we will definitely have cheap open source models at least as good as Fable :)

sho5d ago

I don't think we will. The open model labs are too resource constrained to approach Fable or even Opus on the general case and I don't see that changing within a year.

Right now, due to profound shortfalls in both data and hardware compared to the US labs, the OSS models are IMO basically technology demonstrators that in practise are even more jagged than the US labs' efforts. The high points of the jaggedness are close - but number of happy paths is many times fewer, and their behaviour inside the harness is far less refined. Barring some incredible breakthrough I don't think that is changing without a much higher level of resources - which seems impossible given the current hardware environment.

I have no reason to think that Anthropic or OpenAI are in possession of some secret sauce that the Chinese labs can't duplicate given the right resources, but the fact remains that absent those resources they'll remain behind. Barring some incredible bombshell reveal from Huawei I don't think this asymmetry resolves in a year. In three years it may well be a different story.

linzhangrun5d ago

deepseek-v4-pro, probably the representative cheap opensouce LLM, was released in 2026.4 One year before, what OAI had in hand was gpt-4.1 and gpt-o3. I think it is not very controversial to say that deepseek is stronger than them, at most you can point to some post-training problems, basically the instability you mentioned. Also I am not sure if it is because the people who are best at using AI -- the people making AI -- get more development speed as the models get smarter, but my feeling is model progress is getting faster and faster. GPT-3.5 and GPT-4 were almost one year apart. The disadvantage from hardware limits and compute shortage is visible from the size of chinese models. glm-5.2, which is claimed to be around opus-4.6 level in coding, is only 744B. But Chinese engineers are obviously, how to put it, getting very effective results on "performance at the same size". And that is not even talking about the advantages from China's electricity, manpower, or even "national will" to compete against America. So saying it may take three years to catch up with a gap that is now only several months looks too pessimistic. ChatGPT itself was released only three and a half years ago, and today is already a completely different world.

sho5d ago

You may be right, and I certainly hope so!

But the question was about whether the Chinese labs will have fable-equivalence in 1 year. I am by no means some kind of insider, but knowing the vaguest outlines of what went into Mythos, they just can't do it. The compute is not there. The Chinese engineers are incredible, but they're not literal magicians.

Of course there could be something incredible to come out of left field and overturn the apple cart yet again, but that's speculation. It would be awesome, sure! But I wouldn't bet too heavily on it.

And FWIW - again, no disrespect at all to the Chinese engineers but I don't rate GLM5.2 as being even close to opus 4.6. It can hit a few benchmarks, sure, that's the top edge of the "jag". But filling in the rest of the capabilities - again, it takes compute and data the OSS labs just don't have, that anyone knows about at least.

tumdum_5d ago· 2 in thread

I find the attitude shown in this post very surprising. On the one hand, the post starts with a story of adopting Linux and other FOSS. The core of FOSS is giving its users the ability to understand and modify software they run. On the other hand, the rest of the post is about using a tool (LLM) that the author has no way to modify and no way to understand. Huge matrices of floats are at best comparable to compiled code. But the reality is even worse - it’s actually easier to decompile and understand proprietary software. Not to mention the fact the most of the time users can’t even run the “open” models since it requires hardware that most can’t afford.

How did we get from prising software freedoms to this?

5424585d ago

I’d disagree wrt “modify”. There are all sorts of tools for modifying LLM weights (ie to remove refusals, remove layers or experts, merge models, finetune, and more) and a quick glance at huggingface or civit will show those in very active use.

I don’t think the hardware requirements are relevant. If a research lab publishes the code their particle collider runs under the GPL, that doesn’t make it not OSS even though they’re the only ones on the planet with the hardware to run it.

tomjakubowski5d ago

You can also edit binary distributions of models with means besides changing their weights. See "LLM Neuroanatomy: How I Topped the LLM Leaderboard Without Changing a Single Weight."

On the spectrum of:

  careful engineering--hacking--mad science

This kind of thing falls far towards the mad science end of the scale, but has proven effective.

https://dnhkng.github.io/posts/rys/

whatever15d ago· 2 in thread

Claude started becoming useful for my coding purposes after it hit version 4.6. After that sure some nice to have additions but I think if I had 4.6 sonnet & opus as open weights, I would not need something more.

Having played a bit with Fable, reinforced the above.

JeremyNT5d ago

Yeah for me the coding inflection point was relatively recently (GPT 5.3 perhaps). There's just a threshold they have to hit to be consistent enough to avoid having to redo work and only the later models started delivering it.

This certainly seems feasible for open weight models eventually, but I'm still extremely skeptical of the claims about reaching this level with any open weight model that can be run locally (nevermind the hardware costs to do so practically).

ch4s35d ago

I agree and I'd love for local models to hat the sonnet 4.6 level but nothing seems really all that close, and I'm not particularly excited about giving money to deepseek.

mdale6d ago· 2 in thread

I think the frontier will command premium for sometime just as slight better software developers were 10x's vs their peers as their architecture & development strategies and code approach compounded quickly. One less error per block of work compounds quickly.

Sure, there may be some cases and reasons for local models and industry is so large they will continue to make progress and gather economic value and users for specific use case; but frontier will command vast majority of the economic value distinct from Linux and open source where the model created better than proriatary economic incentives around development

byzantinegene5d ago

10x developers were not slightly better than their peers, they were vastly superior and faster. OTOH, the lead of frontier llms is diminishing as training is getting diminishing returns.

Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.

4fffs5d ago

Youre clutching at straws.

Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.

Aurornis5d ago· 1 in thread

The headline says one thing, then the article text says this:

> I’m hoping it’s going to be minimal.

I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.

I just can’t agree yet. The models from Anthropic and OpenAI really are that much better than anything else. The open weight models must be universally benchmaxxed across the board because my real world experience with them is very different than what the benchmarks imply. I get downvoted a lot for speaking about my experience because I don’t think it’s the reality that people want to hear right now, but it’s true for complex work.

I do think there are a lot of easier tasks that can be handled appropriately by the open weight models in the hands of a skilled operator. If an entire job is simple enough that you wouldn’t hesitate to hand it off to a junior with a little supervision then any model will do. However for a lot of the work I do, even Opus 4.8 on Max requires a lot of attention and extra steering and review to keep it on track. Fable did, too, though to a lesser degree. When I try to use the big open weight models (hosted, because they’re not running at reasonable speeds locally at a quantization I can tolerate) it feels like I spend more time waiting while they burn tokens for output that I probably have to reject anyway, at least for the bigger tasks. I wish they were there, but that’s not the case yet.

iot_devs5d ago

Do you have any example?

1 more reply

PeterStuer5d ago· 1 in thread

While I agree with some of the gist of the article, 2 remarks:

1. Unfortunatly in my tests the open models do not (yet?) rival, at least Claude Opus, for software development/engineering and adjacent tasks.

2. Enjoy while it lasts. I'll be genuinly amazed these open models will not be declared 'illegal' under some security pretense by the end of the year. And I say 'pretense' because the primary driver will be regulatory capture and industry protectionism.

mirekrusin5d ago

Banning models in US just strengthens competing states, ie. China.

1 more reply

reacharavindh5d ago· 1 in thread

It was easy to be a rebel and use Linux when it was clearly competent, but needed hacks and extra elbow grease to get it polished for use. IME, the open models are “not there yet” in terms of capability or operational needs. Sure, GLM5.2 looks competent, but I will only be able to get it to run that competent if I had a huge cluster of GPUs.. if I am accessing an open model via hosted API, I might as well run a closed model via hosted API. The incentives fall apart in comparison to using Linux 15 years ago.

Don’t get me wrong. I wish I could run a local model and be happy about it. At the moment, I’m not.

hypfer5d ago

> if I am accessing an open model via hosted API, I might as well run a closed model via hosted API.

uh.. no?

The whole thing is that it cannot be enshittified, because there's not just a single party having control over it.

As it has happened, is happening and will happen.

With open weights, you cannot easily be rugpulled or locked out or any of that stuff. If the corp attempts that, someone else with an server farm will gladly take you as a customer with absolutely 0 changes to your workflow other than swapping out the API URL + Key.

You'll be talking to the same model with the same personality and same knowledge.

arttaboi5d ago· 1 in thread

I guess this will happen soon. There are two catalysts needed for this to happen:

1. Evals that can quickly tell you how much downside there is to switching 2. Something like OpenRouter that can help you run those evals quickly

Now #2 is starting to become popular, and I think we'll soon see more people adopting a model-agnostic approach. Of course, there will still be high-intelligence use cases where nothing comes close to Claude or GPT.

alexhans5d ago

Exactly. I'm very happy the discourse has moved on from "but X model is the best" to "you can use open models".

Whether you're using SDK or harness based agents, having evals means you're able to modify any part of your agent and still know what satisfies your "good enough".

It's great for designing products that are easy to change as well.

Animats5d ago· 1 in thread

OK, now what? Someone offers open models as a service? That's basically a time-sharing computing business - people at terminals sharing remote computing resources. If you buy your own H100 it will be idle while you're typing or reading or thinking. So sharing makes sense.

But it doesn't have to be an "AI company". It's just a compute service. The companies that offer web hosting could get into this.

flexagoon5d ago

> The companies that offer web hosting could get into this.

They already do. DigitalOcean is one of the providers on OpenRouter, for example

spiralcoaster5d ago

This can't be for real.

The title asserts there is minimal downside to switching to open models, but the article provides zero evidence that this is true, and the author hasn't even attempted it yet. The end of the article states "I’m hoping it’s going to be minimal".

I wonder if I can get a post to the front page with the title: "There are no real barriers to humans colonizing Mars next month". And at the end, "I'm hoping there are no real challenges."

3 more replies

DrScientist5d ago

What's amazing about these models is they are effectively a distillation of the internet in something that can fit onto your local machine [1] and be queried via natural language.

[1] It seems inevitable that decent local models will be possible as the technology and the hardware is improving at a rate beyond the growth of the knowledge base to be distilled.

GL265d ago

What makes an open model worse is ultimately the budget : you have access to worse data, not SOTA models, less GPU compute time, and having a good fine tuning team is extremely expensive. Linux works because the entry barriers are purely on a software side : a lot of contributers all around the world can outclass any OS by contributing on their scale to Linux. All you need to contribute is a computer, and your brain. Open models don't have the same community push, they rely on core ressources that not anyone owns. And injecting them in the model costs too much money. If there are no public breakthroughs in the way we train large open models that makes community led models 10x better, the shift to open models will never happen on a large scale.

anuramat5d ago

there is zero downside to not switching though: just use claude while it's good and subsidized, switch if rugpulled

1 more reply

implexa_founder4d ago

i'm tired of this narrative about switching to open source models. If people wanted to switch... well they would have switched! The amount of effort Anthropic and OpenAI are putting behind inference, harness, building great agentic applications on top of the frontier model... IS NOT A SMALL THING.

Think about it, if you want to use an open source model to run an agent over long horizons, you need someone to offer it reliably. And if all compute goes to frontier models there is very little left for even startups to build agentic companies on top of these open source models.

_pdp_5d ago

There are downsides depending on how good is your harness. Switching the model is easy enough. Ensuring that the harness continues working the way it did is a completely different thing. This is not just about the prompts but also general behaviour around the model and its infrastructure.

So while it is not complicated and certainly something that can be solved, it is not plug and play.

That being said, we switch to open weight models earlier this month and the results has been more than positive so far. The cost savings are also hard to dismiss.

c-b5d ago

What's confusing to me is that there is no discussion about the actual downside experienced it's just theoretical.

1 more reply

hungryhobbit5d ago

What a stupid and pointless article. It's like OP decided "I might go for a walk today" and then wasted his own time writing an essay about how he might go for a walk ... and then wasted hundreds of people's time publishing it!

snootypoot5d ago

if you can afford to run one that is powerful enough for what you need it will be the superior choice. nobody should be feeding private data to the axis of evil (open ai, anthropic, palantir). i saw a job listing recently about an ai proficient data guy needed to work in a medical office. i thought to myself that there was very little chance they were cognizant of how medical records privacy laws conflict with non-local ai use. things are getting unhinged in the world.

ZeroGravitas5d ago

It seems the best self-hosted and the worst models served by big providers has some considerable overlap in quality.

Whatever reason people have to run those (cheaper? backwards compatibility once you get something running) surely applies to the open models too, maybe even more so.

myzek5d ago

Any tips on which model to use and how to use them? I have 64 RAM and 16 VRAM (I know it's not a lot, it's a gaming GPU) and I'm trying to find a good model to use but it's a bit of a struggle

peter_retief5d ago

What open models are "recommended"?

I like the Linux analogy, I struggled with Linux way back.

petesergeant5d ago

Headline: "The is minimal downside"

Article: "I’m hoping it’s going to be minimal"

PcChip6d ago

Is it just me or is half the article missing?

I enjoyed the first part though

OtomotO5d ago

I am absolutely pro local and true open source models.

Personally I haven't seen any productivity gain since Opus 4.5 times.

But: I can't fully get behind the opinion that (so called) "open source models" are simply superior and will be in the future, because when I asked some models who they are, they answered with "I am Claude from Anthropic", which could mean they have been trained by exfiltrating Claude.

I have NO moral objection to this, as Anthropic and "Open""AI".also trained their models on anything they could get their hands on.

It's more about the question: can and will these models be updated, even if Anthropic et al fail. Who's gonna pay for training then? What's their incentive? Have we reached a plateau?

1 more reply

cpill5d ago

I think once the hardware process comes down and these mini DGXs become cheaper, and by then open models still be smaller and better, there is going to be less and less reason to use the providers. CEOs are already complaining that they are costing too much. There are also large organisations like Banks which can't use external services and are already looking at internal housing. it's a good thing so the big AI companies just went IPO as once the self hosting trend kicks in they are going bust.

aussieguy12345d ago

>There was a time not too long ago when using Linux entailed some professional risk1. First there was compatibility: you may not have been able to render a Word document or PowerPoint correctly, and you might have had to trust Open Office’s export capability to render docs the way you wanted

For a while during this era, I used to port my laptops windows installation into a virtual machine that can run on Linux. It took a bit of hacking away but I could usually do it in a day or two. Then its all Linux with the windows vm being used for the microsoft stuff.

blindriver5d ago

As someone that has pretty powerful desktop that I've been using with local open weight models, people are far exaggerating the quality of them. Some of them are now useful. They don't compare yet to the online models of ChatGPT, Claude, Gemini, etc. They are still about 18 months behind. I have accomplished useful work with them, like image classification on Gemma4, but they are much much slower, much much more expensive and they don't scale at all.

A $10,000 RTX 6000 Blackwell card will pay for 500 months of Claude or Codex, which is 40 years worth of compute. Obviously they are going to raise their prices, my prediction being to $200-500/month, but that still makes them at least years of compute and they scale very well with more traffic. Single GPUs do not, they are pegged at 100% and good luck getting it to answer multiple queries at the same time.

epolanski5d ago

I unsubscribed from Anthropic and our (EU-based) team is moving to an "ai-server" running opencode + GLM 5.2 and DS4.

There are several benefits:

- we cut AI spending by thousands

- there is one AI server and starting different sessions for each user, one memory/skills/etc and everybody is involved into reviewing what went wrong and why. Harness finally makes sense and pays off more.

- we can trust that the models are those that we run and not black boxes

- no more money flowing to US narcissistic entrepeneurs and no more business being tied to US legislation

Not gonna lie, GPT 5.5 Pro and Fable 5 were a tiny bit ahead, especially on longer vibecode-style tasks, but it's just not worth it.

root_axis5d ago

Imagine taking 6 months longer to release your cookie cutter CRUD app.

j / k navigate · click thread line to collapse

316 comments

158 comments · 36 top-level

coffinbirth5d ago· 43 in thread

That's why I'm using eurouter.ai with the following routing rule for all my requests:

  {
    "model": "glm-5.2",
    "models": [
      "deepseek-v4-pro",
      "deepseek-v4-flash"
    ],
    "provider": {
      "allow_fallbacks": true,
      "data_collection": "deny",
      "data_residency": "EU",
      "max_retention_days": 0,
      "eu_owned": true
    }
  }

Sure, it's quite expensive, but at least on a legal side data privacy is ensured. I trust them more than e.g. Anthropic, OpenAI or OpenRouter.

Personally, I find it morally unacceptable to use U.S. AI tools, because I do not want to support them financially and thus support the crimes they are involved in[1].

[1]: https://news.ycombinator.com/item?id=48512339

himata41135d ago

My conclusion from the very start is that Anthropic's strategy are pure optics and considering the fact that there was an outpoor of support for the company I think it has been very successful.

dminik5d ago

Yeah, it was funny seeing a bunch of people going like "Anthropic is fighting for privacy" meanwhile I'm like "Uhh, what about the other 8 billion people?"

On second thought, it's not funny.

beng-nl5d ago

nerdsniper5d ago

cbolton5d ago

They can include these limitations in a contract which can be enforced like any contract.

2 more replies

oefrha5d ago

> anthropic red lines

Alleged red lines. Could be just talking points for garnering sympathy. Big tech aren’t exactly known for being truthful, especially big tech partnering with esteemed Palantir.

1 more reply

throwaw125d ago

> The part that gets me about anthropic red lines is "of Americans", okay so the rest of the civilized world is up for grabs then?

And this is coming from a CEO who constantly claims moral superiority and advances the idea that China is bad

avadodin5d ago

These companies are so good at selling their product's likely incompetence as possibly intentional subversion.

johndough5d ago

I had a look at eurouter.ai and it seems like an extremely bad offer.

- The prices are ridiculous (15 % markup for free account).

- They have a rate limit of 1000 requests per month, unless you pay 40€ per month for ... what exactly is their value proposition?

qznc5d ago

I guess the prices are for "EU owned" instead of "EU hosted". The data centers in the EU where you can rent GPUs is mostly US companies.

trollbridge5d ago

It looks like a business opportunity, then, to provide inference that is EU-local and/or EU-owned.

If there aren't enough businesses who want to do this, the EU should figure out how it can properly incentivise that to change.

imhoguy5d ago

Hosting anything in EU must cover redtape and carbon taxes in electricity bill.

Grombobulous5d ago

That seems pretty unsubstantiated. Hetzner proves that EU data center != expensive.

Low carbon does not equal expensive, either. Solar is the cheapest power generation method. Solar plus grid scale batteries is in the same cost ballpark as natural gas.

jampekka5d ago

The markup is not going to the providers, only the router. It seems more like eurouter found a niche it can milk for a while.

KronisLV5d ago

Actually got curious about other alternatives to OpenRouter and looked into it a bit.

EURouter (Amsterdam): https://www.eurouter.ai/pricing

Eden AI (France): https://www.edenai.co/pricing

nexos.ai (Lithuania): https://nexos.ai/pricing/

Requesty (Germany): https://www.requesty.ai/pricing

Cortecs (Austria): https://cortecs.ai/pricing

Nordference (Estonia): https://nordference.ai/pricing

root-parent5d ago

Crimes does not even starts to describe it:

"AI-assisted targeting in the Gaza Strip" - https://en.wikipedia.org/wiki/AI-assisted_targeting_in_the_G...

"Palantir allegedly enables Israel's AI targeting in Gaza, raising concerns over war crimes" - https://www.business-humanrights.org/de/neuste-meldungen/pal...

"What The Wounds Are Telling Us" - https://www.volkskrant.nl/kijkverder/v/2025/gunshot-palestin...

bandrami5d ago

If data security is an actual concern I don't think there's a solution other than biting the bullet and self-hosting.

fg1375d ago

helloplanets5d ago

https://support.claude.com/en/articles/15425996-data-retenti...

fg1375d ago

The original comment is about GLM/deepseek models. As you already pointed out, this applies if you use those specific Claude models on any platform, so I don't know what the point is.

quikoa5d ago

> it's not an European company serving the LLM

That's a pretty big downside if data privacy and sharing is one of the main concerns.

fg1375d ago

I'd like to see some real reasoning here that is based on facts.

1 more reply

trollbridge5d ago

The great part about open models is that you can do this.

Do you have a sound reason to need EU data locality? You can.

Do you want the confidence (and are willing to accept the expense) of only running models on local hardware you control? You can.

Do you want the cheapest possible option - choosing a Chinese, for example, provider, or perhaps a provider offering it for free where you agree they can use your prompts? You can.

Do you have moral objections and want to actually live by them? You can.

ttoinou5d ago

You dont care about which exact provider it is using behind the hood ?

Phlogi5d ago

No, as long as they follow the requirements, especially the data privacy agreements. What would you? Price?

fredoliveira5d ago

Output quality immediately comes to mind, of course.

1 more reply

ttoinou5d ago

How do you know they're following requirements ? At least a quick search about the company providing the service would be useful

1 more reply

codedokode5d ago

Not only it requires a minimum payment of 39 euro, it doesn't accept cryptocurrency althogh that can be worked around by buying a prepaid virtual card for crypto.

yogorenapan5d ago

What services give you a prepaid virtual card for crypto without KYC?

WhyNotHugo5d ago

simianwords5d ago

GDPR compliant llm was a joke a few months back but here we are

speedgoose5d ago

I work in Europe, sometimes with sensitive data, and LLMs weren’t an exception a few months ago.

Maybe it was funny to you, but designing data platforms that respect GDPR and involve LLMs is a thing.

throw12345678915d ago

But is no joke anymore.

vonneumannstan5d ago

Lol what performative shite. Chinese astroturfing 101. You're either mentally ill or a shill.

throwaway274485d ago

Why use EU specifically? I get not trusting the US, of course, but surely the EU isn't far behind in its desire to spy on its own citizens. Do you not live there?

earthnail5d ago

From all the large governmental institutions, the EU is the one currently holding up traditional western values. That gives it street cred in this subject.

kortilla5d ago

>traditional western values

This seems tautological because Europe is pretty weak on the values that people in the US might care about (freedom of speech, limited govt, etc).

What values specifically are you optimizing for here?

2 more replies

throwaway274485d ago

Edit: c'mon people, if you're going to use such ambiguous phrases at least have the spine to clue the reader in to what you want them to refer to in this context.

1 more reply

hdgvhicv5d ago

https://www.theguardian.com/us-news/ng-interactive/2026/feb/...

The age old joke;

A Russian and an American are drinking at a bar

The Russian says "I'm impressed by american propaganda. It's so subtle but effective."

The american responds "What are you talking about, we don't do propaganda."

2 more replies

cpursley5d ago

8 more replies

0xDEAFBEAD5d ago

"The situation for free speech in Europe is even worse than I thought"

https://eternallyradicalidea.com/p/the-situation-for-free-sp...

1 more reply

Phlogi5d ago

US Data Privacy is not sufficient.

throwaway274485d ago

For what? Does the EU not want to spy on its citizens? That strikes me as... unlikely.

Why not host in east asia? Or southeast asia? Or south america? Or africa? Then you avoid both the government with incentive to spy on you (assuming you live in the EU) and american companies.

3 more replies

julianlam6d ago· 32 in thread

I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

Your codebase didn't change, so use the open weight model. Don't move the goalposts.

kgeist5d ago

Every new proprietary model is "groundbreaking" and "look, it just solved task X that no other model could solve," only to be referred to as "that crappy previous-generation model" a month later.

jbverschoor5d ago

Not only that, but to me it seems that after a week the intelligence is being downscaled or routed. Maybe because of lack of capacity

matheusmoreira5d ago

inigyou5d ago

What if the new model is exactly as good as the last model on launch day but better than the last model was on the new model's launch day because it was degraded? Every single time?

3 more replies

manyatoms5d ago

Unless what you're getting is really explicitly spelled out in a contract, you should flatly assume that they're doing whatever they like whenever they like.

1 more reply

LPisGood5d ago

People talk about this a lot. What I have never seen is a discussion of methods they might employ to degrade the models.

Let’s say I’m a bad faith LLM operator, and I want to degrade my model so the next release looks better and people want to switch to the more expensive one. How would I do that?

3 more replies

taytus5d ago

At current prices, and considering these OS Models' performance, investing in local inference sounds like a bad idea.

2 more replies

trollbridge5d ago

realusername5d ago

There's also a lot of benchmark trickery going on, it's becoming harder to see how the latest models really improved.

The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.

bonesss5d ago

I’m an LLM fan, but from an engineering perspective the idea of building atop services that palpably fluctuate in capacity, performance, and capability is nutty.

Even with minor automation I feel like I can watch OpenAI and Anthropic engineers fiddling in real-time. Tuesdays behaviour changes by Thursday, 10AMs production isn’t possible at 11:30AM. Nutty.

2 more replies

intothemild5d ago

Since I started running my own inference server, I've had zero degradation that I didn't do myself. Basically the only time I see it get worse is if I drop one of the quants.

Which is what I suspect the providers are doing to fit more inference on the same amount of hardware over time.

Barbing5d ago

Interesting, Claude might be doing better since I last checked:

https://marginlab.ai/trackers/claude-code-historical-perform...

There were at least a couple of these degradation trackers.

fsuts5d ago

Agreed

4fffs5d ago

Correct. Anything else is pure marketing and you have fallen for it.

Aurornis5d ago

> I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models

I experiment a lot with the open models and I’m getting tired of this trope. I’m not yet convinced that even the best open weight models are equal to Opus from “a few months” ago.

I know what the benchmarks say. I had higher hopes. My real experience just doesn’t match the benchmarks.

I also do a lot of work that even Opus 4.8 struggles with. When even the cutting edge LLMs aren’t all the way there yet, my motivation to switch to something even further behind just isn’t there.

CamperBob25d ago

Have you found anything specific that the full-precision quant of GLM 5.2 can't do that Opus 4.8 can? I haven't, so far.

iot_devs5d ago

I would love if you could make some examples

OtomotO5d ago

> My real experience just doesn’t match the benchmarks.

That's exactly the problem I have... with Anthropic and "Open""AI"

itwaswatson5d ago

The moat is so flat, it only gives +1 food and +1 production. +1 gold with a road.

calgoo5d ago

dwoosley5d ago

The only reason I'm on HN right now reading this post is because the Anthropic's API is down... so there's another point for self hosted.

qznc5d ago

taormina5d ago

For that matter, the new models are shit. If I’m using Opus 4.6 anyway to get anything actually done, then great, we’re actually entirely caught up then.

827a5d ago

Gigachad5d ago

The reason for me is work pays for Github Copilot which doesn't have these open modals.

TacticalCoder5d ago

> I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

Rinse and repeat.

The current models, according to them, are basically AGI and they can go fishing while paid subscriptions solve the world's problems.

These people should have been snake-oil salesmen (and it could be what they actually are).

nemomarx5d ago

Not unusual in the tech space, but this has been basically constantly happening for two years now? I can't imagine the improvements are more than incremental at this point.

windexh8er5d ago

tonfreed5d ago

moomoo115d ago

ok but your competition using the latest models has an advantage

not all of us are doing noob shit lol

handoflixue5d ago

You're being entirely unreasonable. 640 kilobytes of memory was enough for Bill Gates, and yet somehow your special project needs more?

59nadir5d ago

Heh, if you're using LLMs heavily for work I think odds are pretty good you're doing pretty trivial stuff. It might not be trivial to you, but you're probably just not very good at this.

pkulak5d ago· 11 in thread

Sure. But OpenAI is the same price. Why would I pay $18/month for z.ai when OpenAI is $20/month?

CJefferson5d ago

itake5d ago

But… the models will fall behind. As libraries and languages and tool calling updates or the world knowledge changes, the models decay.

Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.

hypfer5d ago

> But… the models will fall behind.

Yes but why does that matter? If I am happy with its capabilities now, I will continue being happy with its capabilities in the future.

Yes, it cannot do the newest magic shit, but why does that matter? It can still do everything that existed up until that point, which is _a lot_.

Eventually, you might also need something new, but it's not like the world shifts over all problems that exist from <old> to <new> and any tech for <old> problems suddenly becomes obsolete?

1 more reply

OtomotO5d ago

No problem, "AI" will just write its own frameworks and libs then!

taytus5d ago

This is a good point I never thought of. I appreciate it.

0xbadcafebee5d ago

pbgcp20265d ago

Subscriptions are done. By the end of 2026 everyone will be paying for actual mils of tokens consumed, via API calls.

fulafel5d ago

https://news.ycombinator.com/item?id=48618455

pkulak5d ago

I pay month to month.

notatoad5d ago

flexagoon5d ago

OpenCode Go is $10/month and the limits are much more generous than those or Codex

causality05d ago· 7 in thread

I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

sleepyeldrazi5d ago

wilj5d ago

> I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

The things you describe are just tool calling, they're a feature of whatever harness you use. Use OpenCode, pi.dev, or maki.sh with any of the open models.

You can do most of this with some system prompts added to whatever agent you're using. You can do it from the settings on the claude/chatgpt websites too. (minus the no-guardrails thing)

newwttbreak5d ago

What are good resources and forums where I can figure out these system prompts to bypass guardrails, atleast on agents?

JSR_FDED5d ago

Just go to kimi.com and try for yourself (not affiliated, but happy user).

First time I did this I realized in 5 seconds that the big players weren’t going to be carving up the market between them.

linzhangrun5d ago

You can let the AI solve it itself, and then it will provide two solutions: implement a local search service (easily blocked), or purchase a Web Search API service

flexagoon5d ago

tr_user5d ago

isn't that just in the harness?

bnj5d ago· 6 in thread

I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.

Aurornis5d ago

The reason you don't see more of this is because everyone does the math, realizes it's not a good deal, and then gives up on the idea.

There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...

The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.

FridgeSeal5d ago

I mean sure, I’d you’re attempting to run the biggest possible models, it’s going to require a stupid amount of compute? I thought we all knew this?

The appeal to me is that we can run that, but we can also run smaller models on your laptop _and it’s functional!_ I can run DeepSeek v4 flash and a qwen 3.6 on my laptop! Thats crazy good.

markerz5d ago

There are plenty of providers of open models that offer very affordable rates. Generally, I recommend looking at OpenRouter since they track various metrics for the various providers.

uberex5d ago

https://news.ycombinator.com/item?id=48524387

blackoil5d ago

Open models hosted in Cloud???

pbgcp20265d ago

DANmode6d ago· 6 in thread

But, what model are you using?

and what hardware are you using?

0gs6d ago

yeah, on a 96GB Mac Studio and Gemma+Qwen, it's definitely fully doable. fully doable but not really for coding on 16GB. but svelter models and cheaper (eventually) hardware are coming!

nezuzen6d ago

"cheaper (eventually) hardware" Best case 2-3 years from now. Otherwise it will take a major global recession to get us anywhere near last year's prices.

marcus_holmes5d ago

Macs are expensive hardware, but I'm always seeing people running LLMs on them. Is anyone running on cheaper generic hardware and Linux?

brucehoult5d ago

A Mac is cheaper than a high end GPU with the same amount of RAM.

2 more replies

Gigachad5d ago

fluidcruft5d ago

radhitya5d ago· 3 in thread

Have you read about Opencode Go? They are great provider for open model, like GLM 5.2, Deepseek v4 Pro, Kimi 2.7 Code. You should give it shot to them :-)

2muchtime5d ago

The amount the HN community, at least from what I’ve seen, is sleeping on OpenCode Go (and zen) is kind of amazing.

$10 a month gets you generous usage with the best open weight models and they claim to have zero retention and not to train on your usage.

It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

johndough5d ago

> It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.

The advantage of OpenRouter compared to OpenCode Go is that the price for DeepSeek-V4-Pro and MiMo-V2.5-Pro is better on OpenRouter.

2muchtime5d ago

So the cost makes sense I was unaware but

“The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.”

Opencode Go gives you a choice between “the best” open weight models and you’re not tied down to just GLM or MiniMax and Zen gives you an even longer list of providers including Claude and GPT?

Is it that Openrouter gives you access to like… every model and provider?

linzhangrun5d ago· 3 in thread

sho5d ago

I don't think we will. The open model labs are too resource constrained to approach Fable or even Opus on the general case and I don't see that changing within a year.

linzhangrun5d ago

sho5d ago

You may be right, and I certainly hope so!

Of course there could be something incredible to come out of left field and overturn the apple cart yet again, but that's speculation. It would be awesome, sure! But I wouldn't bet too heavily on it.

tumdum_5d ago· 2 in thread

How did we get from prising software freedoms to this?

5424585d ago

tomjakubowski5d ago

You can also edit binary distributions of models with means besides changing their weights. See "LLM Neuroanatomy: How I Topped the LLM Leaderboard Without Changing a Single Weight."

On the spectrum of:

  careful engineering--hacking--mad science

This kind of thing falls far towards the mad science end of the scale, but has proven effective.

https://dnhkng.github.io/posts/rys/

whatever15d ago· 2 in thread

Having played a bit with Fable, reinforced the above.

JeremyNT5d ago

ch4s35d ago

I agree and I'd love for local models to hat the sonnet 4.6 level but nothing seems really all that close, and I'm not particularly excited about giving money to deepseek.

mdale6d ago· 2 in thread

byzantinegene5d ago

10x developers were not slightly better than their peers, they were vastly superior and faster. OTOH, the lead of frontier llms is diminishing as training is getting diminishing returns.

Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.

4fffs5d ago

Youre clutching at straws.

Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.

Aurornis5d ago· 1 in thread

The headline says one thing, then the article text says this:

> I’m hoping it’s going to be minimal.

I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.

iot_devs5d ago

Do you have any example?

1 more reply

PeterStuer5d ago· 1 in thread

While I agree with some of the gist of the article, 2 remarks:

1. Unfortunatly in my tests the open models do not (yet?) rival, at least Claude Opus, for software development/engineering and adjacent tasks.

mirekrusin5d ago

Banning models in US just strengthens competing states, ie. China.

1 more reply

reacharavindh5d ago· 1 in thread

Don’t get me wrong. I wish I could run a local model and be happy about it. At the moment, I’m not.

hypfer5d ago

> if I am accessing an open model via hosted API, I might as well run a closed model via hosted API.

uh.. no?

The whole thing is that it cannot be enshittified, because there's not just a single party having control over it.

As it has happened, is happening and will happen.

You'll be talking to the same model with the same personality and same knowledge.

arttaboi5d ago· 1 in thread

I guess this will happen soon. There are two catalysts needed for this to happen:

1. Evals that can quickly tell you how much downside there is to switching 2. Something like OpenRouter that can help you run those evals quickly

alexhans5d ago

Exactly. I'm very happy the discourse has moved on from "but X model is the best" to "you can use open models".

Whether you're using SDK or harness based agents, having evals means you're able to modify any part of your agent and still know what satisfies your "good enough".

It's great for designing products that are easy to change as well.

Animats5d ago· 1 in thread

But it doesn't have to be an "AI company". It's just a compute service. The companies that offer web hosting could get into this.

flexagoon5d ago

> The companies that offer web hosting could get into this.

They already do. DigitalOcean is one of the providers on OpenRouter, for example

spiralcoaster5d ago

This can't be for real.

I wonder if I can get a post to the front page with the title: "There are no real barriers to humans colonizing Mars next month". And at the end, "I'm hoping there are no real challenges."

3 more replies

DrScientist5d ago

What's amazing about these models is they are effectively a distillation of the internet in something that can fit onto your local machine [1] and be queried via natural language.

[1] It seems inevitable that decent local models will be possible as the technology and the hardware is improving at a rate beyond the growth of the knowledge base to be distilled.

GL265d ago

anuramat5d ago

there is zero downside to not switching though: just use claude while it's good and subsidized, switch if rugpulled

1 more reply

implexa_founder4d ago

_pdp_5d ago

So while it is not complicated and certainly something that can be solved, it is not plug and play.

That being said, we switch to open weight models earlier this month and the results has been more than positive so far. The cost savings are also hard to dismiss.

c-b5d ago

What's confusing to me is that there is no discussion about the actual downside experienced it's just theoretical.

1 more reply

hungryhobbit5d ago

snootypoot5d ago

ZeroGravitas5d ago

It seems the best self-hosted and the worst models served by big providers has some considerable overlap in quality.

Whatever reason people have to run those (cheaper? backwards compatibility once you get something running) surely applies to the open models too, maybe even more so.

myzek5d ago

Any tips on which model to use and how to use them? I have 64 RAM and 16 VRAM (I know it's not a lot, it's a gaming GPU) and I'm trying to find a good model to use but it's a bit of a struggle

peter_retief5d ago

What open models are "recommended"?

I like the Linux analogy, I struggled with Linux way back.

petesergeant5d ago

Headline: "The is minimal downside"

Article: "I’m hoping it’s going to be minimal"

PcChip6d ago

Is it just me or is half the article missing?

I enjoyed the first part though

OtomotO5d ago

I am absolutely pro local and true open source models.

Personally I haven't seen any productivity gain since Opus 4.5 times.

I have NO moral objection to this, as Anthropic and "Open""AI".also trained their models on anything they could get their hands on.

It's more about the question: can and will these models be updated, even if Anthropic et al fail. Who's gonna pay for training then? What's their incentive? Have we reached a plateau?

1 more reply

cpill5d ago

aussieguy12345d ago

blindriver5d ago

epolanski5d ago

I unsubscribed from Anthropic and our (EU-based) team is moving to an "ai-server" running opencode + GLM 5.2 and DS4.

There are several benefits:

- we cut AI spending by thousands

- we can trust that the models are those that we run and not black boxes

- no more money flowing to US narcissistic entrepeneurs and no more business being tied to US legislation

Not gonna lie, GPT 5.5 Pro and Fable 5 were a tiny bit ahead, especially on longer vibecode-style tasks, but it's just not worth it.

root_axis5d ago

Imagine taking 6 months longer to release your cookie cutter CRUD app.

j / k navigate · click thread line to collapse