Broadly: No one is using the Chinese AI models. Everyone, globally, everywhere, including in China, is using the models from OpenAI, Anthropic, and Google. The models from the Big Three western labs represent >80% of all tokens processed and likely >95% of all revenue.
> Gemini is processing 746T per week
I read this totally differently. A startup nobody really knows is doing half a percent of Google on a commodity task?!? Google, which puts Gemini on billions of devices by default, without the user asking? Google, which is distributing Gemini to users who are unaware they are even using it?
Versus a startup that does not even have a login button on its homepage?
This is astonishing.
Not to mention, week on week more and more tokens are being processed via OpenRouter. [0]. The number keeps going up, with no end in sight in my opinion, if the China models continue offering cheaper inference, whilst tailing behind not too far, the line will keep going up.
[0] - https://openrouter.ai/rankings
OpenRouter is not the only "router" type AI company. More fixed providers like OpenCode and commandcode are offering subscription services on open/china models, likely consuming billions of tokens each. Who know how many tokens are being process directly against Deekseek and Kimi's APIs.
DeepSeek's official API, which has 10x cheaper cached input cost isn't even on OpenRouter as a provider, so just like Google, most volume is not going through OpenRouter. (Gemini's official hosted api is on OpenRouter BTW)
Also you're comparing an API with Google's internal corporate and consumer app use. Bytedance announced they were using 63T tokens/day (441T / week) at the end of 2025, so they are probably even higher than Google now. We don't know how much weekly tokens the DeepSeek chatapp uses, but it would also be a very high number much higher than OpenRouter tokens.
For the real reason of the recent price drops, go ask your AI about how much it would cost to run DeepSeek V4 or MiMo 2.5 after Ascend 950 PR have started to be mass delivered in 2026 Apr at $10k / card.
But you're right that OpenRouter is only one data point. It is, unfortunately, one of the few we have.
you could equally say, in the last complete week openrouter processed more deepseek tokens than any other provider including google
that also would not tell you much about how many tokens are used on deepseek
I mean, I am going to use the best I can afford. And at work that's Opus, but while work is happy to let me spend $50+/day, that's just not viable for personal hobby use, I need to keep that in the realm of a WOW/mmo subscription.
PS: Have not tried this but Deepseev4 Flash (not even Deepseekv4 Pro version) with set to "high" has pretty much Claud Opus 4.7 level of capabilities and is lightening fast and dirty cheap. Hours and hours of conversation barely costs few cents.
Very disproportionate intelligence-to-cost ratio.
I'm leveraging this temporary anomaly and using it as my coding workhorse.
I can easily run it in a 8 bit quant with the 4 x 48GB Radeon Pro W7900 GPUs I snagged for 2k each before the memory squeeze.
A 158B parameter model, especially in an architecture as efficient as DS4 is not that hard to drive currently if you got in before the craze, and will be relatively easy to drive with future hardware generations.
A big caveat here is that many US companies (particularly in sensitive industries, like defense) will likely not want to (or not be allowed to) use Chinese models for anything of substance.
I can only conclude that people who claim they are aren't doing anything close to the edge of what these models are capable of or any niche things.
I would say DSv4Pro is around the same level as Sonnet.
You are basically paying out the nose for a few seconds of VRAM residence if you are giving significant money for cache reads.
The very nature of autoregressive language modeling is that every single output token produced "reads" the cache.
So in principle the price floor for a cache hit is the flat cost of 1 output token.
Now in reality it has to be more than that because you are occupying VRAM with the cache that forces out other users. But it can still be really cheap.
And using up gpus for that cache is a pretty big opportunity cost. I highly doubt it's done in vram. That would be insane for the one hour caches.
So its memory + the time it takes to unload/load into vram + the extra cost per output token
Is it a scam? Idk
Mimo cost ~$400 at the old price, so about $40 today. Opus cost ~$5000
That's over 100x cheaper, and just 3 points behind.
I can't wait to experiment with an llm consortium of 100 deepseek and mimo models. Crazy times.
Shut up and take my m̶o̶n̶e̶y̶ data!
Edit: Gemini on google search told me I could write strikethrough text on hn using <s>. Mimo told me it was unsupported and then went on to list some tags that are supported, like <b>bold</b>. I tried copy pasting the word in strikethrough from a word processor but it lost the format. I ended up using mimo in an agent shell wrapper to produce it, and copy pasting from the terminal worked for some reason.
MiniMax (currently 2.7) which is a ~270B model tuned exclusively for agentic purposes, performs so MUCH better; it's more reliable and cheaper. Both are still far away from Opus 4.7 that I'm using at work. IMO benchmarks are just a very rough estimation; everyone cheats as much as they can get away with. Test the model yourself; do not make any assumptions based on the benchmarks.
I would love to see specialized, cheaper, bleeding-edge models like MiniMax for other non-agentic purposes as well. Why pay $1 for a general model when, for example, you can pay $0.1 for a content-moderator model that you actually need?
This is waaaaay more constrained than even Claude Pro plan, let alone Deepseek V4 or Kimi K2.6 pricing.
Does this work: s̶t̶r̶i̶k̶e̶t̶h̶r̶o̶u̶g̶h̶
Chinese models incidentally slurps up some terms that lead them to finding unflattering words that you wrote about the CCP in a random journal entry, or maybe a social media csv export. You go to China one day and are denied entry due to what you said.
Realistic or no? (yes i know the us is getting bad in re. to what you write online as well)
Models hosted in China are a siren call that I don't feel bad about resisting.
Chinese people can’t really do the same.
> How realistic is this:
Completely unrealistic unless you are a high value target (journalist, spy, business man, etc...)
Besides, the Chinese government doesn't really care about individual criticisms, even in public, especially in languages other than Chinese. What they really care about censoring is attempts to organize collective action. They don't care about personal opinions stated in the blog posts of tourists, let alone diary entries.
I really like the US model of free speech, at it's best. It feels natural and right to me. It would be cool if Chinese people had stronger freedom of political speech— I'd love to hear Chinese people publicly share their thoughts online without restraint or censorship; it's a huge country with a lot of smart people with diverse opinions.
But maybe you should go visit China sooner rather than later, tbf. It's friendlier and weirder and more interesting than you think, including w/r/t the censorship regime.
At least the Xiaomi models are open weights and you can host them yourself, avoiding such concerns.
> CBP denies travelers entry because of anti-Trump comments
It's possible they've finally integrated cheap(er) chinese chips. It's also possible they're just subsidising inference for real-world usage data. Interesting either way.
Like I responded to someone else:
- Cheap electricity - Cheap, domestically produced GPUs - Efficiency research. (a lot of it from Deepseek's research)
Also, the Chinese government wants the AI to be as accessible as EVs so everyone will use it.
Only artificial barriers will keep people using some of the frontier stuff in a couple of years. No costs will justify.
Same reason they release some of the models for free: They are trying to capture market share.
Their only moat is maybe being SOTA but that only lasts so long before everyone else catches up.
Also, DSv4 has access to Huawei Ascend GPUs that have native FP4 that allows all-native FP4+FP8 mixed compute that is more efficient than emulated FP4. Less so for 3rd party providers.
My plan was just upgraded to 38 BILLION tokens per month. That's at least 10X the tokens I've used in my entire agentic development so far.
I should probably downgrade my plan, but we'll see. :)
For example, I've heard DeepSeek v4 Pro is comparable to Sonnet 4.7, so I just bought some credits to try it out.
MiMo is the best one I've used so far, but I haven't done anything interesting with the Claude 4.7 models. It seems conservative with generally good "instincts", getting things working quickly without too much complexity. I've also embedded it in several different projects so far, and it's been pretty easy and effective.
It's funny thinking the US companies are hiking prices and Chinese ones do the opposite, it's obviously an strategy, but pretty funny
DeepSeek does not understand image, audio or video.
Deepseek made the discounted price permanent before this.
The question is how they are managing to do so? They are supposed to struggle due to chip sanctions.
Secondly, why now? The US companies were supposed to subsidize too but now they are unable to keep up. Everyone going to usage based pricing, so it's unsustainable for them. They are well funded too.
If there are genuine hardware breakthrough reducing compute needs then that is good for the whole world I believe.
As Jensen has been pointing out for almost a year now, these sanctions were ineffective and probably had the opposite effect of the desired goal.
The history is fairly long, but an inflection point could likely be traced to Trump v1 era DOJ enforcement on (among others) Huawei's CFO Meng Wanzhou in 2018. Huawei was hit with the (really big) stick in international transactions: OFAC violation accusations, and it was a seminal moment in the company's internal operations -- they concluded they needed a fully internal supply chain in China, and retooled for it. Meng Wanzhou cases in the US were eventually dismissed, but she was on house arrest in Canada through 2021 or so.
Fast forward to 2024 -- Huawei was culturally and technically ready to build AI accelerators -- one of the externalities of the sanctions was to provide additional benefit to Chinese companies for buying from Huawei; those economics seem to have provided a boost to on-shore development.
I have been using DeepSeek, and I am finding it better than Claude or Codex, to be honest.
I don't see myself going back.
1. Some companies are very good in training and serving at much lower cost
2. Some companies have access to new much cheaper hardware
3. People have realzeid that you dont need a 3.2T model when a 310B one (Opus vs MiMo 2.5) performs equally well for your particular task.
So, at this stage of time I'm not even totally looking at lesser models to save costs or usage, im using lesser models because they fit the task more than fine and will nail it. Instead though, 6 months from now the model landscape will be totally different, costs will not have gotten better(for US companies), because their priorities are almost entirely on chasing capability of models.
So i hope you're right and the overall market is moving the direction you mention, but I think the US will continue this absurd race to... just being #1 regardless of how much it stops making sense.
that has been the model of software since, like, ever
as someone who now lives & has lived in the west for the majority of their adult life - yeah the US western models r fucked n the crazy valuations of the A.I labs - which also filters down to the economy - since all money instead of being put to productive use is being wasted on this shit. hell electricity bills are up - cz datacenters need power. the current crooks in power don't believe in clean energy.
I stopped tagging my country as developing and then third world and call it for what it is, a POOR country. I know with increasing certainty that my country will be poor for the rest of my life. I also expect AI to be as available as computers: there are the "have", and there are the "don't have", which is almost always a lifetime condition.
Every industry-wide scale technological revolution has happened because government funded a technology and then opened it up to the masses. Just look at your iPhone: GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't)
Economist Mariana Mazzucato wrote a great book about this called The Entrepreneurial State: Debunking Public vs. Private Sector Myths
I really don't think China cares about that. Chinese government's governance logic is making everything so cheap that everyone can get and use it. They did it with EVs and other things. Now they are doing it with the AI.
They aren't aiming companies but users which many have no common sense and grant these agentic AI access to everything.
All the restrictions the US imposed to CH, will be reverted back and it will be even worse, because now the data is not reaching the US gov ( we all know they have access to US big techs data ) but CH.
I really hope this goes viral and breaks Nvidia/OpenAI.
This seems great! Between just these two providers, this is a couple pairs of models that seem suitable for replacing Claude Sonnet and Claude Haiku, at around 1/20th the price.
It's a bummer for me that nothing can match at least Opus 4.6 or GPT-5.5 yet, since I'd characterize those as the first models to actually be good enough to be useful for writing code, at least in my experience at work.
But for simple stuff, or situations where you can have the huge model dispatch to subagents or just "advise" or "supervise" smaller agents on their work, this looks great. Wherever the frontier models end up in a year, if there are open-weight contenders like this around GPT-5.5's level by then, I think I can be happy and productive doing most prototyping with those models and hand-editing for quality or more serious work.
From their docs "After using 10M input (cache miss) tokens of MiMo-V2.5-Pro, it is equivalent to consuming 3000M Credits, and you can still enjoy 1100M Credits of MiMo-V2.5". So it's around 12M input credit vs Earlier 60M tokens.
This is why Anthropic wants these chinese AI models banned as they are in the lead in the AI race to zero and they know that there is no modal moat.
So don't tell Dario.