That's why I'm using eurouter.ai with the following routing rule for all my requests:
{
"model": "glm-5.2",
"models": [
"deepseek-v4-pro",
"deepseek-v4-flash"
],
"provider": {
"allow_fallbacks": true,
"data_collection": "deny",
"data_residency": "EU",
"max_retention_days": 0,
"eu_owned": true
}
}
Sure, it's quite expensive, but at least on a legal side data privacy is ensured. I trust them more than e.g. Anthropic, OpenAI or OpenRouter.Personally, I find it morally unacceptable to use U.S. AI tools, because I do not want to support them financially and thus support the crimes they are involved in[1].
What gets me the most is that they claim that the model should follow the https://www.anthropic.com/constitution and they claim that it's embedded into the model. However, system prompts in claude code and cowork re-iterate all of these points and if they're embedded you shouldn't need to do that. Now, if you ask the API version of claude to be a hitler supporter with enough prompt engineering it will become one which directly contradicts what they claim to do, opus 4.7 specifically will be happy to create anti-(insert minority group) propaganda although I haven't had the same success with 4.8 thus far, but I also haven't been motivated enough to push it in that direction yet since I've been more interested in exploting the cyber capabilities of the model.
My conclusion from the very start is that Anthropic's strategy are pure optics and considering the fact that there was an outpoor of support for the company I think it has been very successful.
On second thought, it's not funny.
Regardless of Anthropic's "moral" position (inasmuch as a corporation can even have morals) against spying on non-Americans, they would have no way to enforce that limitation against the government because non-citizens outside of the USA have no protections from the intrusions of the US government.
Alleged red lines. Could be just talking points for garnering sympathy. Big tech aren’t exactly known for being truthful, especially big tech partnering with esteemed Palantir.
And this is coming from a CEO who constantly claims moral superiority and advances the idea that China is bad
- The prices are ridiculous (15 % markup for free account).
- They have a rate limit of 1000 requests per month, unless you pay 40€ per month for ... what exactly is their value proposition?
- They have a single provider (TensorX) for DeepSeek-V4-Pro, with a cache read cost that is over 100 times higher than DeepSeek ($0.44 vs $0.003625). Notably, I had to look at the TensorX website for that information, since I could not find any information about cached token cost on eurorouter.ai.
If there aren't enough businesses who want to do this, the EU should figure out how it can properly incentivise that to change.
Low carbon does not equal expensive, either. Solar is the cheapest power generation method. Solar plus grid scale batteries is in the same cost ballpark as natural gas.
There’s nothing about data centers that is inherently a high carbon business. It’s only a high carbon business in places like the US where political leadership purposefully fights against renewable energy projects that private businesses want to undertake on their own dime.
EURouter (Amsterdam): https://www.eurouter.ai/pricing
Eden AI (France): https://www.edenai.co/pricing
nexos.ai (Lithuania): https://nexos.ai/pricing/
Requesty (Germany): https://www.requesty.ai/pricing
Cortecs (Austria): https://cortecs.ai/pricing
Nordference (Estonia): https://nordference.ai/pricing
Guess those are really popping up as mushrooms, eh? Not an endorsement of any of those on my part cause I haven't personally used them, but seems like there are at least options for those who need them.
"AI-assisted targeting in the Gaza Strip" - https://en.wikipedia.org/wiki/AI-assisted_targeting_in_the_G...
"Palantir allegedly enables Israel's AI targeting in Gaza, raising concerns over war crimes" - https://www.business-humanrights.org/de/neuste-meldungen/pal...
"What The Wounds Are Telling Us" - https://www.volkskrant.nl/kijkverder/v/2025/gunshot-palestin...
> Limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered.
> Change applies to organizations that have set up workspaces with zero data retention (ZDR) in Claude Console, use Claude Code with ZDR in Claude Enterprise, or access Claude through AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry with ZDR.
https://support.claude.com/en/articles/15425996-data-retenti...
That's a pretty big downside if data privacy and sharing is one of the main concerns.
Do you have a sound reason to need EU data locality? You can.
Do you want the confidence (and are willing to accept the expense) of only running models on local hardware you control? You can.
Do you want the cheapest possible option - choosing a Chinese, for example, provider, or perhaps a provider offering it for free where you agree they can use your prompts? You can.
Do you need to comply with some kind of regulation like GDPR or rules for contracting with the U.S. federal government? No problem. (Although I'm still waiting for DeepSeek V4 to show up on Amazon BedRock so it can be used from GovCloud...)
Do you have moral objections and want to actually live by them? You can.
Models are converging, but they converge in bands, and frontier is frontier. I would not like to have any workflows in any area of my business where output is generated by an assortment of models from different providers. For trivial, mundane tasks that might be fine, but it certainly doesn't apply across the board.
Maybe it was funny to you, but designing data platforms that respect GDPR and involve LLMs is a thing.
This seems tautological because Europe is pretty weak on the values that people in the US might care about (freedom of speech, limited govt, etc).
What values specifically are you optimizing for here?
Edit: c'mon people, if you're going to use such ambiguous phrases at least have the spine to clue the reader in to what you want them to refer to in this context.
The age old joke;
A Russian and an American are drinking at a bar
The Russian says "I'm impressed by american propaganda. It's so subtle but effective."
The american responds "What are you talking about, we don't do propaganda."
https://eternallyradicalidea.com/p/the-situation-for-free-sp...
Why not host in east asia? Or southeast asia? Or south america? Or africa? Then you avoid both the government with incentive to spy on you (assuming you live in the EU) and american companies.
I know LLMs move at the speed of light (especially these past few quarters), but if Opus and GPT "a few months ago" were really like open weight models, then there's really no reason to not switch, especially for those who were using these models a few months ago.
Your codebase didn't change, so use the open weight model. Don't move the goalposts.
So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.
Let’s say I’m a bad faith LLM operator, and I want to degrade my model so the next release looks better and people want to switch to the more expensive one. How would I do that?
The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.
Even with minor automation I feel like I can watch OpenAI and Anthropic engineers fiddling in real-time. Tuesdays behaviour changes by Thursday, 10AMs production isn’t possible at 11:30AM. Nutty.
Which is what I suspect the providers are doing to fit more inference on the same amount of hardware over time.
https://marginlab.ai/trackers/claude-code-historical-perform...
There were at least a couple of these degradation trackers.
I experiment a lot with the open models and I’m getting tired of this trope. I’m not yet convinced that even the best open weight models are equal to Opus from “a few months” ago.
I know what the benchmarks say. I had higher hopes. My real experience just doesn’t match the benchmarks.
I also do a lot of work that even Opus 4.8 struggles with. When even the cutting edge LLMs aren’t all the way there yet, my motivation to switch to something even further behind just isn’t there.
5.2 lives up to the hype. I don't find it to be the best at anything except coding. But for coding... yeah, it lives up to the hype. Not quite Opus 4.8-level, but I would feel comfortable comparing it to 4.5, at least if it had vision capabilities.
That's exactly the problem I have... with Anthropic and "Open""AI"
The moat is so flat, it only gives +1 food and +1 production. +1 gold with a road.
The really interesting thing is that it's typically those very same accounts who were explaining, a few months ago, that thanks to their commercial model they were gaining so much time and producing so much fantastic code.
A few months passes and suddenly the open-source model have caught up with the models that were gaining them so much time and that produced amazing code (in production everywhere for sure btw) but... It's impossible to work with these models.
Rinse and repeat.
The current models, according to them, are basically AGI and they can go fishing while paid subscriptions solve the world's problems.
But when it six months there shall be new closed, pricey, models and when the open ones shall have reach the level of Fable, we'll hear how it's impossible to work in late 2026 on a model that is "only at the level of Fable".
These people should have been snake-oil salesmen (and it could be what they actually are).
Not unusual in the tech space, but this has been basically constantly happening for two years now? I can't imagine the improvements are more than incremental at this point.
Just like the OS ecosystem I think we'll see a similar trajectory with OAI, Anthropic and Google but on a much accelerated time scale. I think the lobbying has begun to lock in their fate for revenue - because none of them give a shit about their users. I do hope, however, that Anthropic continues to over rotate and continue to gimp their models into uselessness. I just asked Opus 4.8 the other day to look at some code as an adversary and summarize areas that should be addressed. Nothing specific and it shut down the conversation. However starting a new prompt and prodding the model from a different angle yielded the results I asked for directly. Pick a lane. Or, don't and continue to lose industry respect and consideration.
not all of us are doing noob shit lol
Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.
Yes but why does that matter? If I am happy with its capabilities now, I will continue being happy with its capabilities in the future.
Yes, it cannot do the newest magic shit, but why does that matter? It can still do everything that existed up until that point, which is _a lot_.
Eventually, you might also need something new, but it's not like the world shifts over all problems that exist from <old> to <new> and any tech for <old> problems suddenly becomes obsolete?
I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".
I have given up on making Opus actually retrieve online information for me. At this point I only query it side by side with qwen to laugh at how it didn't even attempt to search properly, and how a small local model is beating it every time. Gemini is very fast for searching, but somehow miss-sources all the time.
The things you describe are just tool calling, they're a feature of whatever harness you use. Use OpenCode, pi.dev, or maki.sh with any of the open models.
> I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".
You can do most of this with some system prompts added to whatever agent you're using. You can do it from the settings on the claude/chatgpt websites too. (minus the no-guardrails thing)
First time I did this I realized in 5 seconds that the big players weren’t going to be carving up the market between them.
I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.
There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...
TL;DR: Running GLM 5.2 is going to cost about $20K minimum, and that's going to be painfully slow compared to the cloud hosted versions. Even the estimates where the server is computing tokens 24/7 you can't break even for several years.
The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.
The appeal to me is that we can run that, but we can also run smaller models on your laptop _and it’s functional!_ I can run DeepSeek v4 flash and a qwen 3.6 on my laptop! Thats crazy good.
and what hardware are you using?
$10 a month gets you generous usage with the best open weight models and they claim to have zero retention and not to train on your usage.
It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.
The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.
The advantage of OpenRouter compared to OpenCode Go is that the price for DeepSeek-V4-Pro and MiMo-V2.5-Pro is better on OpenRouter.
For example, DeepSeek costs $0.435/0.87/0.003625 for 1M in/out/cached tokens (https://openrouter.ai/deepseek/deepseek-v4-pro), compared to an equivalent of $1.74/3.48/0.0145 under the OpenCode Go plan (https://opencode.ai/docs/go/#usage-limits), almost exactly 4x.
But since you get a monthly usage limit of $60 with the OpenCode Go plan for $10 (i.e. 6x), you might still come out ahead if you use it a lot (or use other models, where the pricing difference is smaller or non-existent).
“The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.”
Opencode Go gives you a choice between “the best” open weight models and you’re not tied down to just GLM or MiniMax and Zen gives you an even longer list of providers including Claude and GPT?
Is it that Openrouter gives you access to like… every model and provider?
Right now, due to profound shortfalls in both data and hardware compared to the US labs, the OSS models are IMO basically technology demonstrators that in practise are even more jagged than the US labs' efforts. The high points of the jaggedness are close - but number of happy paths is many times fewer, and their behaviour inside the harness is far less refined. Barring some incredible breakthrough I don't think that is changing without a much higher level of resources - which seems impossible given the current hardware environment.
I have no reason to think that Anthropic or OpenAI are in possession of some secret sauce that the Chinese labs can't duplicate given the right resources, but the fact remains that absent those resources they'll remain behind. Barring some incredible bombshell reveal from Huawei I don't think this asymmetry resolves in a year. In three years it may well be a different story.
But the question was about whether the Chinese labs will have fable-equivalence in 1 year. I am by no means some kind of insider, but knowing the vaguest outlines of what went into Mythos, they just can't do it. The compute is not there. The Chinese engineers are incredible, but they're not literal magicians.
Of course there could be something incredible to come out of left field and overturn the apple cart yet again, but that's speculation. It would be awesome, sure! But I wouldn't bet too heavily on it.
And FWIW - again, no disrespect at all to the Chinese engineers but I don't rate GLM5.2 as being even close to opus 4.6. It can hit a few benchmarks, sure, that's the top edge of the "jag". But filling in the rest of the capabilities - again, it takes compute and data the OSS labs just don't have, that anyone knows about at least.
How did we get from prising software freedoms to this?
I don’t think the hardware requirements are relevant. If a research lab publishes the code their particle collider runs under the GPL, that doesn’t make it not OSS even though they’re the only ones on the planet with the hardware to run it.
On the spectrum of:
careful engineering--hacking--mad science
This kind of thing falls far towards the mad science end of the scale, but has proven effective.Having played a bit with Fable, reinforced the above.
This certainly seems feasible for open weight models eventually, but I'm still extremely skeptical of the claims about reaching this level with any open weight model that can be run locally (nevermind the hardware costs to do so practically).
Sure, there may be some cases and reasons for local models and industry is so large they will continue to make progress and gather economic value and users for specific use case; but frontier will command vast majority of the economic value distinct from Linux and open source where the model created better than proriatary economic incentives around development
Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.
Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.
> I’m hoping it’s going to be minimal.
I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.
I just can’t agree yet. The models from Anthropic and OpenAI really are that much better than anything else. The open weight models must be universally benchmaxxed across the board because my real world experience with them is very different than what the benchmarks imply. I get downvoted a lot for speaking about my experience because I don’t think it’s the reality that people want to hear right now, but it’s true for complex work.
I do think there are a lot of easier tasks that can be handled appropriately by the open weight models in the hands of a skilled operator. If an entire job is simple enough that you wouldn’t hesitate to hand it off to a junior with a little supervision then any model will do. However for a lot of the work I do, even Opus 4.8 on Max requires a lot of attention and extra steering and review to keep it on track. Fable did, too, though to a lesser degree. When I try to use the big open weight models (hosted, because they’re not running at reasonable speeds locally at a quantization I can tolerate) it feels like I spend more time waiting while they burn tokens for output that I probably have to reject anyway, at least for the bigger tasks. I wish they were there, but that’s not the case yet.
1. Unfortunatly in my tests the open models do not (yet?) rival, at least Claude Opus, for software development/engineering and adjacent tasks.
2. Enjoy while it lasts. I'll be genuinly amazed these open models will not be declared 'illegal' under some security pretense by the end of the year. And I say 'pretense' because the primary driver will be regulatory capture and industry protectionism.
Don’t get me wrong. I wish I could run a local model and be happy about it. At the moment, I’m not.
uh.. no?
The whole thing is that it cannot be enshittified, because there's not just a single party having control over it.
As it has happened, is happening and will happen.
With open weights, you cannot easily be rugpulled or locked out or any of that stuff. If the corp attempts that, someone else with an server farm will gladly take you as a customer with absolutely 0 changes to your workflow other than swapping out the API URL + Key.
You'll be talking to the same model with the same personality and same knowledge.
1. Evals that can quickly tell you how much downside there is to switching 2. Something like OpenRouter that can help you run those evals quickly
Now #2 is starting to become popular, and I think we'll soon see more people adopting a model-agnostic approach. Of course, there will still be high-intelligence use cases where nothing comes close to Claude or GPT.
Whether you're using SDK or harness based agents, having evals means you're able to modify any part of your agent and still know what satisfies your "good enough".
It's great for designing products that are easy to change as well.
But it doesn't have to be an "AI company". It's just a compute service. The companies that offer web hosting could get into this.
They already do. DigitalOcean is one of the providers on OpenRouter, for example
The title asserts there is minimal downside to switching to open models, but the article provides zero evidence that this is true, and the author hasn't even attempted it yet. The end of the article states "I’m hoping it’s going to be minimal".
I wonder if I can get a post to the front page with the title: "There are no real barriers to humans colonizing Mars next month". And at the end, "I'm hoping there are no real challenges."
[1] It seems inevitable that decent local models will be possible as the technology and the hardware is improving at a rate beyond the growth of the knowledge base to be distilled.
Think about it, if you want to use an open source model to run an agent over long horizons, you need someone to offer it reliably. And if all compute goes to frontier models there is very little left for even startups to build agentic companies on top of these open source models.
So while it is not complicated and certainly something that can be solved, it is not plug and play.
That being said, we switch to open weight models earlier this month and the results has been more than positive so far. The cost savings are also hard to dismiss.
Whatever reason people have to run those (cheaper? backwards compatibility once you get something running) surely applies to the open models too, maybe even more so.
I like the Linux analogy, I struggled with Linux way back.
Article: "I’m hoping it’s going to be minimal"
I enjoyed the first part though
Personally I haven't seen any productivity gain since Opus 4.5 times.
But: I can't fully get behind the opinion that (so called) "open source models" are simply superior and will be in the future, because when I asked some models who they are, they answered with "I am Claude from Anthropic", which could mean they have been trained by exfiltrating Claude.
I have NO moral objection to this, as Anthropic and "Open""AI".also trained their models on anything they could get their hands on.
It's more about the question: can and will these models be updated, even if Anthropic et al fail. Who's gonna pay for training then? What's their incentive? Have we reached a plateau?
For a while during this era, I used to port my laptops windows installation into a virtual machine that can run on Linux. It took a bit of hacking away but I could usually do it in a day or two. Then its all Linux with the windows vm being used for the microsoft stuff.
A $10,000 RTX 6000 Blackwell card will pay for 500 months of Claude or Codex, which is 40 years worth of compute. Obviously they are going to raise their prices, my prediction being to $200-500/month, but that still makes them at least years of compute and they scale very well with more traffic. Single GPUs do not, they are pegged at 100% and good luck getting it to answer multiple queries at the same time.
There are several benefits:
- we cut AI spending by thousands
- there is one AI server and starting different sessions for each user, one memory/skills/etc and everybody is involved into reviewing what went wrong and why. Harness finally makes sense and pays off more.
- we can trust that the models are those that we run and not black boxes
- no more money flowing to US narcissistic entrepeneurs and no more business being tied to US legislation
Not gonna lie, GPT 5.5 Pro and Fable 5 were a tiny bit ahead, especially on longer vibecode-style tasks, but it's just not worth it.