> 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found
Instead of 1192 GPUs they now use 213 for serving those requests.
I guess I’d assumed this sort of thing would be allocated dynamically. Of course, there’s a benefit to minimizing the number of times you load a model. But surely if a GPU+model is idle for more than a couple minutes it could be freed?
(I’m not an AI guy, though—actually I’m used to asking SLURM for new nodes with every run I do!)
If you're using an efficient inference engine like VLLM, you're adding compilation into the mix, and not all of that is fully cached yet.
If that kind of latency isn't acceptable to you, you have to keep the models loaded.
This (along with batching) is why large local models are a dumb and wasteful idea if you're not serving them at enterprise scale.
At the scale of a hyperscaler I think Alibaba is the one that would be doing that. AWS, Azure and I assume Alibaba do lease/rent data centers, but someone has to own the servers / GPU racks. I know there are specialized companies like nscale (and more further down the chain) in the mix, but I always assumed they only lease out fixed capacity.
I've assumed that as well. It makes sense to me since loading up a model locally takes a while. I wonder if there is some sort of better way I'm not in the know about. That or too GPU poor to know about.
It's likely that these are small unpopular (non flagship) models, or that they only pack eg one layer of each model.
14.5% is worth a raise at least. But it’s still misleading.
"A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s."
Which, if you scale it, matches the GPs statement.
thanks to the US restrictions on semiconductor industry (Chinese), Chinese engineers are being forced to innovate and find their own ways to overcome challenges like the old school engineers (What Silicon Valley used to be)
That said, I'm not sure what the US policies specifically have to do with this. Countries are always in competition with one another, and if one industry or technology is considered a national security threat they will guard it.
> However, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.
In many senses there's hubris in the western* view of China accomplishments: most of what western companies have created has had significant contribution by Chinese scientists or manufacturing, without which those companies would have nothing. If you look at the names of AI researchers there's a strong pattern even if some are currently plying their trade in the west.
---
* I hate the term "western" because some "westeners" use it to separated what they think are "civilized" from "uncivilized", hence for them LATAM is not "western" even though everything about LATAM countries is western.
While I don't disagree with your overall point, it's important to recognize that this is only a phenomenon of the last ~30 years, and to avoid falling into the trapn of Han racial chauvinism. E.g. there were ~no Chinese scientists in Germany in the 70s but they were heavily innovating nevertheless.
It’s funny - it’s at the point with Chinese manufacturing for niche electronic goods (e.g rooftop van air conditioner) where some Chinese brands are more trustworthy - more value for your money and sometimes even better overall quality. With American brands you gotta make sure you’re not overpaying for dated tech that is inefficient. Maybe the same will happen with LLMs.
It's worked for a very long time for aircraft.
China has been pushing to build its own aircraft for >23 years. It took 14 years for COMAC to get its first regional jet flying commercial flights on a Chinese airline, and 21 years to get a narrow-body plane flying a commercial flight on a Chinese airline.
If for no technical reasons and purely political, COMAC may still be decades away from being able to fly to most of the world.
Likewise, in ~5 years, China may be able to build Chips that are as good as Nvidia after Nvidia's 90% profit margin - i.e. they are 1/10th as good for the price - but since they can buy them for cost - they're they same price for performance and good enough.
If for purely political reasons, China may never be able to export these chips to most of the world - which limits their scale - which makes it harder to make them cost effective compared to Western chips.
Your first statement is not likely unique to China though, even though they have demonstrated that in about the last 40 years, which I don't really think qualifies as "history". What it does demonstrate is that societies that have a certain kind of ethnic self-respect and can cast off the detrimental influences of foreign, hostile, and even enemy elements to pursue their own self-interest and survival will succeed, regardless of hurdles placed before them.
It's really just a story of personal development and either escaping, evading, and avoiding detrimental, toxic people and their behaviors. All of humanity that all has to currently still share a single planet with ZERO save spots, would be better off if we all not just allowed each other to be ourselves in our won places without others subverting, subjugating, infiltrating, dominating, poisoning, or polluting any other people on the planet. Then everyone can decide if we want to be friends or not friends with each other, collaborate and be friendly or simply avoid each other. We do not have to like each other to get along if everyone agrees on a base understanding that no people can parasitize and abuse and manipulate any others.
I don't think you can really produce a definite counterfactual that they would or wouldn't have taken longer or shorter without it, but certainly they were pushing for self sufficiency long before technology restrictions. But we're not going to be handing our technologies to our competitors on a silver platter, and it's also best for businesses to start weaning themselves off the Chiinese market. Virtually every market reliant on them today is in big trouble.
As for hubris, I think that's more a projection of your part if you want to start bringing up race cards with regards to contributions, that kind of argument would be applicable to everyone. And AI research is highly diverse and international, Chinese names don't dominate the list more than Turks, Greeks, Malaysians, etc.
Why would I do that tho? If we look at the names of scientists/researchers/engineers/businessmen, the conclusion would be that the US has contributed nothing to the world. Europeans did all the hard work!
Also, isn't this the usual path to better computer science? Reducing computation needs by making better/more efficient algorithms? The whole "trillions of dollars of brute force GPU strength" proposed by Altman, Nadella, Musk et al just seems to reinforce that these are business people at heart, not engineers/computer scientists...
Concepts that enable the individual should empower a chosen configuration of society not the other way around.
Contrast this with non westernism where either education of the individual is not valued or the state is the primary goal over the individual.
I’ve worked with states governments and individuals around the world for 20 years and find this very useful definition. What’s confusing is the nations who have half adopted westernism but don’t fully due to either caste systems or government dominated thinking.
It’s an arrow towards rationalism over tradition, individualism over collectivism, flatness over hierarchy, and future over past. But only the limit of the resources any given society has.
China for sure will catch up, the question is what they will do with it. They're not ambitious like the US/West. The US wanted influence all over the world as an extension of the cold war and to keep economic interests safeguarded. But China just doesn't operate that way. They're more hands-off. They could be opening up alibaba cloud datacenters all over the US, offering it as an AWS/Azure alternative, funding tons of startups all over europe, the US,etc... to exert their influence, but they won't. They have a more long-term low-and-slow approach to global domination. The "100 year marathon" as they called it, which they'll win for sure.
China's greatest weakness is not just their lack of ambition,but their command-economy. They're doing capitalism but with central control of the economy. It intertwines government policy with corporate policy, making it harder to do business overseas (like with bytedance/tiktok).
Mexico is a modern country, an industrialized country, a country that is exactly as "western" as the US or Canada. They have the same religious beliefs, speak a dialect of a European language. They have European style cities, a long history of cultural contributions. Yet they're not white enough to be part of "The West".
I think at this point we should be honest with ourselves in it's usage. 90% of the time it's a racist dog whistle.
Really? How long has China been attempting to build their own jet engines? How long have they been attempting to build competitive CPUs?
History has shown withholding tech successfully keeps them at least a generation behind the west.
In some fields like CPUs they “make up for it” by just building larger clusters, but ultimately history does not show what you’re claiming. The only thing it shows is that we need to be even more diligent in protecting IP because a large portion of their catching up is a direct result of stealing the tech they were cut off from.
Aside from geography, attracting talent from all over the world is the one edge the US has a nation over countries like China. But now the US is trying to be xenophobic like China, restrict tech import/export like China but compete against 10x population and lack of similar levels of internal strife and fissures.
The world, even Europe is looking for a new country to take on a leader/superpower role. China isn't there yet, but it might get there in a few years after their next-gen fighter jets and catching up to ASML.
But, China's greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.
That's a strength. Them not having interest in global domination and regime change other than their backyard is what allows them to easily make partners in Africa and LATAM, the most important regions for raw materials.
Do I infer correctly that you believe that China has less internal strife and fissures than the US has?
How can they have international hegemony before they clear their regional order? China is more interested in aligning Taiwan than invading; though it’ll probably invade if it can’t align it diplomatically.
China is probably not interested in continuing the current Western-style order but to implement their own sino-stuff. At least with the CCP at the helm.
I can’t tell whether you think the anti-immigration stance is a good thing or bad thing.
China has an import ban on chips [1] so its irrelevant what the US does.
[1]: https://www.cnbc.com/2025/09/17/nvidia-ceo-disappointed-afte...
Only in response to the US banning the export of the high-end GPUs China wanted. The import ban is the Chinese government burning the the landing ships, it clearly communicates to everyone that there is no going back, and total commitment is expected.
The point still stands that the US instigated the split.
The streets are flooded with cheap Chinese cars and I see more BYD than American cars. If the car wasn't made in Japan or Korea which probably account for most of the cars, it was likely made in China. Moreover, I haven't been in countries with the closest ties to China.
> US cars being completely not competitive in non-US markets
This is definitely untrue in Canada and large parts of LATAM. American cars are all over those places, along with Japanese cars.Japan eventually stopped that role and their products improved greatly.
go to 2024, western labs were crushing it.
it's now 2025, and from china, we have deepseek, qwen, kimi, glm, ernie and many more capable models keeping up with western labs. there are actually now more chinese labs releasing sota models than western labs.
They are lauded for the ability to cost ratio, or their ability to parameter ratio, but virtually everyone using LLMs for productive work are using ChatGPT/Gemini/Claude.
They are kind of like Huffy bicycles. Good value, work well, but if you go to any serious event, no one will be riding one.
while qwen, deepseek and kimi are opensourced and good, they are preferred because of their insane token ratio, they use a lot less for more, but a by product is that they are less accurate it is amazing progress by the chinese companies, but they definitely can improve a lot more
Your view of China is several decades out of date. Chinese labor isn't even that cheap any more. China is moving up the value chain and outsourcing production that needs cheap labor to poorer countries (or replacing workers with robots altogether).
1. https://itif.org/publications/2024/09/16/china-is-rapidly-be...
2. https://www.economist.com/science-and-technology/2024/06/12/...
What exactly is Huawei's flagship series anyway? Because their PanGu line is open-weight, but Huawei is as of yet not in the LLM making business, their models are only meant to signal that it's possible to do training and inference on their hardware, that's all. No one actually uses those models.
China has plenty of R&D and science now.
I used to follow the ones from Western companies, but honestly, after some point in time, I would like to see some cases from what I consider is a good benchmark for everyone that does not work in FAANG in terms of engineering.
I would also assume there's a lot of content in the native Chinese forums, which unfortunately, as an English-speaking person, I wouldn't be able to easily refer to :(
[1] https://www.alibabacloud.com/blog/how-does-alibaba-ensure-th...
The authors mention that NCCL and Ray initialization were too slow (see quote below), but from the description it sounds like they’ve reimplemented a layer that’s increasingly being standardized by frameworks like nixl and uccl.
> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.
> Our current deployment runs in a cross-region cluster comprising 213 H20 GPUs, serving twenty-eight 1.8–7B models (TP=1) and nineteen 32–72B models (TP=4).
I mean, it really shouldn't take tens of seconds for those initialization(s) to occur. There's no good fundamental reason that it should take that long. It's just bloat.