Alibaba Cloud says it cut Nvidia AI GPU use by 82% with new pooling system (opens in new tab)

(tomshardware.com)

523 pointshd46mo ago315 comments

Paper: https://dl.acm.org/doi/10.1145/3731569.3764815

315 comments

Alibaba Cloud claims to reduce Nvidia GPU used for serving unpopular models by 82% (emphasis mine)

> 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found

Instead of 1192 GPUs they now use 213 for serving those requests.

bee_rider6mo ago

I’m slightly confuse as to how all this works. Do the GPUs just sit there with the models on them when the models are not in use?

I guess I’d assumed this sort of thing would be allocated dynamically. Of course, there’s a benefit to minimizing the number of times you load a model. But surely if a GPU+model is idle for more than a couple minutes it could be freed?

(I’m not an AI guy, though—actually I’m used to asking SLURM for new nodes with every run I do!)

miki1232116mo ago

Loading a model takes at least a few seconds, usually more, depending on model size, disk / network speed and a bunch of other factors.

If you're using an efficient inference engine like VLLM, you're adding compilation into the mix, and not all of that is fully cached yet.

If that kind of latency isn't acceptable to you, you have to keep the models loaded.

This (along with batching) is why large local models are a dumb and wasteful idea if you're not serving them at enterprise scale.

cnr6mo ago

Let's say, then, that it's not so much "dumb and wasteful" as "energy inefficient". In fact, this can be quite wise in a modern world full of surveillance-as-a-business and "us-east-1 disasters"

cnr6mo ago

Can you elaborate the last statement? Don't quite understand why loading local LLM to GPU RAM, using it for the job and then "ejecting" is "dumb and wasteful" idea?

2 more replies

behnamoh6mo ago

> This (along with batching) is why large local models are a dumb and wasteful idea if you're not serving them at enterprise scale.

Local models are never a dumb idea. The only time it's dumb to use them in an enterprise is if the infra is Mac Studio with M3 Ultra because pp time is terrible.

svachalek6mo ago

Models take a lot of VRAM which is tightly coupled to the GPU so yeah, it's basically sitting there with the model waiting for use. I'm sure they probably do idle out but a few minutes of idle time is a lot of waste--possibly the full 82% mentioned. In this case they optimized by letting the GPUs load multiple models and sharing the load out by token.

jychang6mo ago

They definitely won't idle out- if they idle out, it'll take on the order of up to 60 seconds to load the model back into VRAM, depending on the model.

That's an eternity for a request. I highly doubt they will timeout any model they serve.

5 more replies

andy_ppp6mo ago

How does this work with anything but trivially small context sizes!?

1 more reply

smallnix6mo ago

> I guess I’d assumed this sort of thing would be allocated dynamically

At the scale of a hyperscaler I think Alibaba is the one that would be doing that. AWS, Azure and I assume Alibaba do lease/rent data centers, but someone has to own the servers / GPU racks. I know there are specialized companies like nscale (and more further down the chain) in the mix, but I always assumed they only lease out fixed capacity.

yorwba6mo ago

The paper is about techniques to do that dynamic allocation to maximize utilization without incurring unacceptable latencies. If you let a GPU sit idle for several minutes after serving a single request, you're setting money on fire. So they reuse it for a different model as soon as possible, starting even before the first request is finished, because: If you don't have a dedicated GPU for a model, are you going to wait for a multi-gigabyte transfer before each request? So they have a dedicated GPU (or two, one for prefill, one for decode) for a group of models that are processed in an interleaved fashion, scheduled such that they stay within the latency budget.

citizenpaul6mo ago

>Do the GPUs just sit there with the models on them when the models are not in use

I've assumed that as well. It makes sense to me since loading up a model locally takes a while. I wonder if there is some sort of better way I'm not in the know about. That or too GPU poor to know about.

make36mo ago

the models are huge, so not a single (latest gen) one can fit on a single GPU.

It's likely that these are small unpopular (non flagship) models, or that they only pack eg one layer of each model.

svachalek6mo ago

Per the very short article, the solution was to pack multiple models per GPU.

1 more reply

hinkley6mo ago

So 82% of 17.7%?

14.5% is worth a raise at least. But it’s still misleading.

abejfehr6mo ago

I don't think that's what this is saying, isn't it that 100 - ~82 = 17.7% ?

hinkley6mo ago

That is a confusing coincidence, but no.

> Reserving full GPU instances for these models leads to allocating 17.7% of our GPUs to serve only 1.35% of requests

> Deployment results show that Aegaeon reduces the number of GPUs required for serving these models from 1,192 to 213, highlighting an 82% GPU resource saving.

82% of their CPUs were serving 98.6% of all traffic. If they reduced the cluster size, they got it to 96.2% of their CPUs serving 98.6% of their traffic. If they reallocated those, which is more likely, then 96.8% of their CPUs are serving 98.6% of all requests, or around 17% more capacity for popular requests on the same hardware.

yorwba6mo ago

Not really, Figure 1(a) of the paper says that the 17.7% are relative to a total of 30k GPUs (i.e. 5310 GPUs for handling those 1.35% of requests) and the reduction is measured in a smaller beta deployment with only 47 different models (vs. the 733 "cold" models overall.) Naïve extrapolation by model count suggests they would need 3321 GPUs to serve all cold models, a 37.5% reduction to before. (Or 6.6% reduction of the full 30k-GPU cluster.)

somerandomdude26mo ago

Really:

"A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s."

Which, if you scale it, matches the GPs statement.

yorwba6mo ago

From the SCMP article you might get the impression that the various figures all refer to the same GPU cluster, but in the paper itself it's very clear that this is not the case, i.e. the 213 GPUs in the smaller cluster are not serving 1.35% of the requests in the larger cluster. Then if you want to scale it, you have a choice of different numbers you could scale, and each would get different results. Since they're constrained by the limited number of different models a single GPU can serve, I think scaling by the number of models is the most realistic option.

xor11016mo ago

Doesnt sound right

MangoCoffee6mo ago

In the past, software and computer engineers would tackle problems head-on, designing algorithms and finding creative solutions.

thanks to the US restrictions on semiconductor industry (Chinese), Chinese engineers are being forced to innovate and find their own ways to overcome challenges like the old school engineers (What Silicon Valley used to be)

_heimdall6mo ago

If you're one who sees progress as an end goal unto itself, what you describe is a good thing. When one party is attempting novel solutions to outcompete the competition we will be faster to whatever the next change is.

That said, I'm not sure what the US policies specifically have to do with this. Countries are always in competition with one another, and if one industry or technology is considered a national security threat they will guard it.

coliveira6mo ago

If AI is a threat to other nations, why is anyone even supporting this? Are we really trying to annihilate the planet as quickly as possible?

1 more reply

djoldman6mo ago

Key paragraph:

> However, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.

make36mo ago

these other models are likely much smaller

majke6mo ago

better link https://www.tomshardware.com/tech-industry/semiconductors/al...

paper https://dl.acm.org/doi/10.1145/3731569.3764815

dang6mo ago

Ok, we've changed the URL above (from https://www.scmp.com/business/article/3329450/alibaba-cloud-...), and will put the link to the paper in the top text. Thanks!

hd4OP6mo ago

Feel like I've made it

hunglee26mo ago

The US attempt to slow down China's technological development succeeds on the basis of preventing China from directly following the same path, but may backfire in the sense it forces innovation by China in a different direction. The overall outcome for us all may be increase efficiency as a result of this forced innovation, especially if Chinese companies continue to open source their advances, so we may in the end have reason to thank the US for their civilisational gate keeping

dlisboa6mo ago

History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

In many senses there's hubris in the western* view of China accomplishments: most of what western companies have created has had significant contribution by Chinese scientists or manufacturing, without which those companies would have nothing. If you look at the names of AI researchers there's a strong pattern even if some are currently plying their trade in the west.

---

* I hate the term "western" because some "westeners" use it to separated what they think are "civilized" from "uncivilized", hence for them LATAM is not "western" even though everything about LATAM countries is western.

achierius6mo ago

> most of what western companies have created has had significant contribution by Chinese scientists or manufacturing, without which those companies would have nothing. If you look at the names of AI researchers there's a strong pattern even if some are currently plying their trade in the west.

While I don't disagree with your overall point, it's important to recognize that this is only a phenomenon of the last ~30 years, and to avoid falling into the trapn of Han racial chauvinism. E.g. there were ~no Chinese scientists in Germany in the 70s but they were heavily innovating nevertheless.

dlisboa6mo ago

Absolutely. China obviously has a longer history with innovation but they like to make it seem everything was invented by them at some point in the past. I'd say newer technology is where China has had a bigger impact.

Consequently newer tech is precisely where global cooperation is most required so no country can really do it by themselves. We could even say no country, western or otherwise, has been doing it on their own for the past 500 years or so but alas...

1 more reply

hshdhdhj44446mo ago

Ironically, the best way America could have prevented China’s rise in tech was by stapling green cards to diplomas of Chinese citizens who completed their higher education in the U.S. like the plan in the early 2010s.

lurk26mo ago

Massive vector for theft of trade secrets and intellectual property.

It’s notable that China did not adopt the same policy during the period you are associating with their rise. Indeed, they’ve taken the opposite stance in recent years and (now that they have stolen American IP) have moved to seize control of assets and expel the superfluous foreigners.

There is a lesson to be learned there, but it’s contrary to the argument you are trying to make.

deadbabe6mo ago

But they didn’t do it, because the current administration can’t get it through their thick skulls that the key advantage the US can have in this world is a monopoly on all the really smart people.

1 more reply

ahmeneeroe-v26mo ago

Is that the best way? China's rise had already happened by the 2010s

Preventing that could have been prevented in the 70s, 80s, 90s by stopping offshoring, blocking student visas, and prosecuting IP theft.

3 more replies

huntertwo6mo ago

The whole “China copies everything” narrative is becoming less and less true.

It’s funny - it’s at the point with Chinese manufacturing for niche electronic goods (e.g rooftop van air conditioner) where some Chinese brands are more trustworthy - more value for your money and sometimes even better overall quality. With American brands you gotta make sure you’re not overpaying for dated tech that is inefficient. Maybe the same will happen with LLMs.

ehnto6mo ago

It's most notable to me in mid level manufacturing equipment. Once upon a time you would never touch a chinese made CNC, lathe, mill etc. Now they're totally fine, and offer significant value for your dollar. Sometimes outperforming other countries offerings while being cheaper to boot. Especially in new industries and processes, suggesting innovation is not the differentiator it used to be.

Enterprises often prefer having US based support and so can prefer US or European machines that have that supply chain setup.

xbar6mo ago

It is less exclusively true.

onlyrealcuzzo6mo ago

> History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

It's worked for a very long time for aircraft.

China has been pushing to build its own aircraft for >23 years. It took 14 years for COMAC to get its first regional jet flying commercial flights on a Chinese airline, and 21 years to get a narrow-body plane flying a commercial flight on a Chinese airline.

If for no technical reasons and purely political, COMAC may still be decades away from being able to fly to most of the world.

Likewise, in ~5 years, China may be able to build Chips that are as good as Nvidia after Nvidia's 90% profit margin - i.e. they are 1/10th as good for the price - but since they can buy them for cost - they're they same price for performance and good enough.

If for purely political reasons, China may never be able to export these chips to most of the world - which limits their scale - which makes it harder to make them cost effective compared to Western chips.

4 more replies

zawaideh6mo ago

Re: Western. A similar thing plays out when the term "international community" is used in news. It refers to the US and its major allies which means US, Canada, Western Europe, Japan, Australia and New Zealand more or less.

3 more replies

heavyset_go6mo ago

It's just straight up low expectations and underestimation derived from racism in the assumption that Americans are smarter and more capable, and Chinese are only good for copying designs and making things we come up with. The idea that they can't do that like we can is pervasive.

sokoloff6mo ago

It’s absolutely fascinating and mystifying (and dismaying) to see educated and otherwise smart people in engineering in the US hold an opinion that two countries with over a billion people each can’t accomplish anything in engineering or science without “us”.

Despite ample and repeated evidence that they can and, in China’s case, that they’re the best in the world in several areas of manufacturing.

hopelite6mo ago

I am not sure exactly to what degree, but "I hate the term 'western' because some 'weste[r]ners' use it to separated what they think are 'civilized' from 'uncivilized'" is definitely a bit of an antiquated perspective at this point; almost like a justification to hold on to other older perspectives about "racism". I have started resorting to using terms like European Cultural Block because of it in certain communities that understand contemporary topics and have an advanced understanding of the world.

Your first statement is not likely unique to China though, even though they have demonstrated that in about the last 40 years, which I don't really think qualifies as "history". What it does demonstrate is that societies that have a certain kind of ethnic self-respect and can cast off the detrimental influences of foreign, hostile, and even enemy elements to pursue their own self-interest and survival will succeed, regardless of hurdles placed before them.

It's really just a story of personal development and either escaping, evading, and avoiding detrimental, toxic people and their behaviors. All of humanity that all has to currently still share a single planet with ZERO save spots, would be better off if we all not just allowed each other to be ourselves in our won places without others subverting, subjugating, infiltrating, dominating, poisoning, or polluting any other people on the planet. Then everyone can decide if we want to be friends or not friends with each other, collaborate and be friendly or simply avoid each other. We do not have to like each other to get along if everyone agrees on a base understanding that no people can parasitize and abuse and manipulate any others.

rayiner6mo ago

Nobody thinks the Japanese aren’t “civilized.” “Western” is just a euphemism for “rich and orderly.”

switchbak6mo ago

It is an odd category, and Japan is often considered to be "Western" - these days at least. That certainly wasn't the case even a few generations ago.

I think it's ostensibly supposed to be more about shared cultural values, but even that is a pretty weak way to divide countries. Perhaps "an ally of the United States" is a little more accurate?

Any societal dividing line like this is bound to hit on problems once subjected to the real world.

1 more reply

bad_haircut726mo ago

Its more about democracy and adhering to the global (set up by America post WW2) system of laws and trade.

1 more reply

corimaith6mo ago

>History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

I don't think you can really produce a definite counterfactual that they would or wouldn't have taken longer or shorter without it, but certainly they were pushing for self sufficiency long before technology restrictions. But we're not going to be handing our technologies to our competitors on a silver platter, and it's also best for businesses to start weaning themselves off the Chiinese market. Virtually every market reliant on them today is in big trouble.

As for hubris, I think that's more a projection of your part if you want to start bringing up race cards with regards to contributions, that kind of argument would be applicable to everyone. And AI research is highly diverse and international, Chinese names don't dominate the list more than Turks, Greeks, Malaysians, etc.

raincole6mo ago

> look at the names

Why would I do that tho? If we look at the names of scientists/researchers/engineers/businessmen, the conclusion would be that the US has contributed nothing to the world. Europeans did all the hard work!

thesmtsolver6mo ago

Another equivalent way to look at that:

Historically, top scientists/researchers/engineers/businessmen migrate from rest of the world to the US rather than to Europe or China.

Imagine if Europe or China were a bit more open with immigration and equally attractive, we would see the same pattern there too.

caycep6mo ago

this is true for anyone - create challenges, and you optimize efficiency elsewhere.

Also, isn't this the usual path to better computer science? Reducing computation needs by making better/more efficient algorithms? The whole "trillions of dollars of brute force GPU strength" proposed by Altman, Nadella, Musk et al just seems to reinforce that these are business people at heart, not engineers/computer scientists...

JuniperMesos6mo ago

What term would you prefer? When you say "western view of China", do you mean to include or not include Latin American countries as part of the group of countries you claim has a hubristic view of China's accomplishments?

tsunamifury6mo ago

It’s helpful to think of westernism as a platonic ideal. Individually derived reason and virtue, superior to state and sometimes ‘gods’ as a tradition to drive up the total survivability, richness, and stability of the community.

Concepts that enable the individual should empower a chosen configuration of society not the other way around.

Contrast this with non westernism where either education of the individual is not valued or the state is the primary goal over the individual.

I’ve worked with states governments and individuals around the world for 20 years and find this very useful definition. What’s confusing is the nations who have half adopted westernism but don’t fully due to either caste systems or government dominated thinking.

It’s an arrow towards rationalism over tradition, individualism over collectivism, flatness over hierarchy, and future over past. But only the limit of the resources any given society has.

lawlessone6mo ago

In a way withholding a tech becomes a signal saying "Hey this is important" so the result is China dedicates more resources to researching it lol.

notepad0x906mo ago

western is a cultural term derived from a geographic one. The US is also not 'western' strictly geographically as it is not in western europe, neither is australia. But they both originated from Britain's empire and share in it's cultural ancestry. It means "western europe and it's cultural derivatives". Spain and Portugal's empire fell away long before britain and france's and they don't have similar geopolitical relations like NATO, so it's hard to consider their former colonies/upstarts part of the same sphere of cultural influence.

China for sure will catch up, the question is what they will do with it. They're not ambitious like the US/West. The US wanted influence all over the world as an extension of the cold war and to keep economic interests safeguarded. But China just doesn't operate that way. They're more hands-off. They could be opening up alibaba cloud datacenters all over the US, offering it as an AWS/Azure alternative, funding tons of startups all over europe, the US,etc... to exert their influence, but they won't. They have a more long-term low-and-slow approach to global domination. The "100 year marathon" as they called it, which they'll win for sure.

China's greatest weakness is not just their lack of ambition,but their command-economy. They're doing capitalism but with central control of the economy. It intertwines government policy with corporate policy, making it harder to do business overseas (like with bytedance/tiktok).

tsunamifury6mo ago

False.

Westernism is broadly an extension of the academic notion of classicism, starting in Egypt and then Greece Rome and into Europe and the Americas.

1 more reply

MSFT_Edging6mo ago

The whole "western" or "the west" always makes me laugh. Half the time it's a dog whistle for "white". Like many right-wing commentators love saying "Western Values" to avoid saying "white, Euro-centric, Christian values".

Mexico is a modern country, an industrialized country, a country that is exactly as "western" as the US or Canada. They have the same religious beliefs, speak a dialect of a European language. They have European style cities, a long history of cultural contributions. Yet they're not white enough to be part of "The West".

I think at this point we should be honest with ourselves in it's usage. 90% of the time it's a racist dog whistle.

andsoitis6mo ago

> Yet they're not white enough to be part of "The West".

In many contexts, Mexico and other LatAm countries are included in the Western Civilization grouping. For instance: https://worldpopulationreview.com/country-rankings/western-c...

Earlier in your comment you say “half the time” while you end with “90% of the time” the phrase “western” is a racist expression, undermining your argument that is already flawed, emotional, and anti-constructive.

skinnymuch6mo ago

Mexico isn’t as “western” as the west. Mexico isn’t the first world. It’s not part of the exploiting Global North.

The stuff you bring up ignore the power dynamics which are arguably the most important part.

lazide6mo ago

But why would anyone want to do that?

You do realize that antagonizing people with nuclear weapons and the largest economies in the world rarely results in positive results, right?

1 more reply

tw046mo ago

> History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

Really? How long has China been attempting to build their own jet engines? How long have they been attempting to build competitive CPUs?

History has shown withholding tech successfully keeps them at least a generation behind the west.

In some fields like CPUs they “make up for it” by just building larger clusters, but ultimately history does not show what you’re claiming. The only thing it shows is that we need to be even more diligent in protecting IP because a large portion of their catching up is a direct result of stealing the tech they were cut off from.

slaw6mo ago

Jet engine since at least 2011. CPU since 2000

https://en.wikipedia.org/wiki/ACAE_CJ-1000A

https://en.wikipedia.org/wiki/Semiconductor_Manufacturing_In...

1 more reply

nextworddev6mo ago

Name one thing China has invented first in LLMs that the “west” adopted as a standard

nextworddev6mo ago

Your silence is deafening, qwen bots

chuckadams6mo ago

I find "western" is often used to disparage "western thought", as in it can't grasp the deep wisdom of those mysterious orientals that transcends normal logic and reason. Declaring such a split is the underpinning of a whole lot of woo-woo beliefs.

hbarka6mo ago

Disparage or exalt? It can also be used in an objective sense without conjuring insult.

1 more reply

notepad0x906mo ago

I think anti-immigrant rhetoric will have the most impact against the US. A lot of the people innovating on this stuff are being maligned and leaving in droves.

Aside from geography, attracting talent from all over the world is the one edge the US has a nation over countries like China. But now the US is trying to be xenophobic like China, restrict tech import/export like China but compete against 10x population and lack of similar levels of internal strife and fissures.

The world, even Europe is looking for a new country to take on a leader/superpower role. China isn't there yet, but it might get there in a few years after their next-gen fighter jets and catching up to ASML.

But, China's greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.

dlisboa6mo ago

> But, China's greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.

That's a strength. Them not having interest in global domination and regime change other than their backyard is what allows them to easily make partners in Africa and LATAM, the most important regions for raw materials.

3 more replies

onetimeusename6mo ago

I went to a school that was heavy on immigrants and had lots of 1st gen citizens as students and all they did was advocate against people like me for admissions and for preferential admissions for their own group. So in my opinion, skilled immigration is not a transfer of talent but an expansion of the upper classes who go to war with each other over a small number of seats. Ironically this zero sum game keeps overall skill levels the same. For every immigrant, say, one citizen loses a seat somewhere.

bkandel6mo ago

China's greatest weakness is that their working-age population has already peaked and is in the process of plummeting, which will continue over the coming decades.

2 more replies

hollerith6mo ago

>But now the US is trying to be xenophobic like China, restrict tech import/export like China but compete against 10x population and lack of similar levels of internal strife and fissures.

Do I infer correctly that you believe that China has less internal strife and fissures than the US has?

notepad0x906mo ago

By perception of the population at least, yes. I mean, the US is literally on the verge of a civil war lol.

3 more replies

csomar6mo ago

> But, China's greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.

How can they have international hegemony before they clear their regional order? China is more interested in aligning Taiwan than invading; though it’ll probably invade if it can’t align it diplomatically.

China is probably not interested in continuing the current Western-style order but to implement their own sino-stuff. At least with the CCP at the helm.

notepad0x906mo ago

They've dominated their region for a long time. Vietnam and NK are their sattellites basically. Russia is their close ally. The only regional opposition they have is India. Taiwan is a small bug to them, the only reason they haven't invaded it is because of TSMC and ASML, mainland China hasn't caught up with them and Nvidia yet.

They're all over Africa and south Asia. But unlike the US/West they don't exert political influence. When they build infrastructure for example, they set up worker camps, isolated from the local population. they only employ their own imported people and clean up and leave quietly afterwards.

They're acting like good business partners, instead of a superpower wielding it's might and extending its influence. it's good for business for all involved parties for sure, and smart too. But not having strong influence means for example, the US can come in, outbid them, bail out african loans to China and they lose that source of commerce.

rayiner6mo ago

> But now the US is trying to … compete against 10x population and lack of similar levels of internal strife and fissures.

I can’t tell whether you think the anti-immigration stance is a good thing or bad thing.

notepad0x906mo ago

it's bad for the US, because China has 10x population. the US can't make up in quality, what it lacks in quantity without immigration and attracting foreigners.

3 more replies

marknutter6mo ago

Nobody is anti-immigrant outside of a small pocket of anti-H1B folks in the tech community. People are, however, anti-illegal-immigrant, which is completely different.

notepad0x906mo ago

A lot of actual racists nazis hijack the anti-illegal-immigrant sentiment. I get that it's illegal, and every US administration has enforced it strictly (despite popular rhetoric).

But, the value illegal immigrants bring to the US economy cannot be understated. Purely from a economic standpoint, illegal immigrants are a huge asset. There are other portions of the population that are largely a liability.

It's not like illegal immigrants are taking skilled work americans could be doing. And let's be honest, even without illegal immigrants, a lot of unskilled work will be replaced by AI/automation.

I personally, have no problem against humane and lawful enforcement of immigration laws. But given that it is a determent to the economy, perhaps more serious and concerning crimes should be enforced? Perhaps the targets should be employers of illegal immigrants? Perhaps zip tying children and locking them in cages and denying them basic hygiene is not the right approach? I think the details is where it gets controversial, most sane people would agree that laws should be enforced.

coliveira6mo ago

The greatest weakness of the US is its utter lack of self awareness and its ambition to dominate others. Nobody is looking for another "leader", people just want to live well without a bully on their neck. So, many countries that are not part of the US closed club are welcoming China as a new business partner.

lesuorac6mo ago

The US isn't slowing China anymore.

China has an import ban on chips [1] so its irrelevant what the US does.

[1]: https://www.cnbc.com/2025/09/17/nvidia-ceo-disappointed-afte...

overfeed6mo ago

> China has an import ban on chips

Only in response to the US banning the export of the high-end GPUs China wanted. The import ban is the Chinese government burning the the landing ships, it clearly communicates to everyone that there is no going back, and total commitment is expected.

xadhominemx6mo ago

The US is certainly slowing down China considerably. China would certainly not have an import ban on Blackwell GPUs if they were made available. And upstream, the ban on EUV and other high end semiconductor production equipment has severely limited china’s capacity to produce logic and DRAM (including HBM).

1 more reply

unethical_ban6mo ago

Would they have done that if the US had been more "reliable" in providing the chips and didn't cut them off in the first place?

The point still stands that the US instigated the split.

reliabilityguy6mo ago

Tbh this whole situation reminds of how Japan excelled in making a lot more with a lot less after WW2, e.g., fuel-efficient engines, light cars, etc. these constraints were not present in the US (and to some extent in Europe), and resulted in US cars being completely not competitive in non-US markets.

dataviz10006mo ago

I've been in Chile, Peru, Colombia, Panama, and Costa Rica.

The streets are flooded with cheap Chinese cars and I see more BYD than American cars. If the car wasn't made in Japan or Korea which probably account for most of the cars, it was likely made in China. Moreover, I haven't been in countries with the closest ties to China.

1 more reply

throwaway20376mo ago

    > US cars being completely not competitive in non-US markets

This is definitely untrue in Canada and large parts of LATAM. American cars are all over those places, along with Japanese cars.

tsunamifury6mo ago

The premature optimizer is never the innovator.

Japan eventually stopped that role and their products improved greatly.

1 more reply

segmondy6mo ago

may backfire? it's a bit too late for that.

go to 2024, western labs were crushing it.

it's now 2025, and from china, we have deepseek, qwen, kimi, glm, ernie and many more capable models keeping up with western labs. there are actually now more chinese labs releasing sota models than western labs.

Workaccount26mo ago

But they aren't keeping up

They are lauded for the ability to cost ratio, or their ability to parameter ratio, but virtually everyone using LLMs for productive work are using ChatGPT/Gemini/Claude.

They are kind of like Huffy bicycles. Good value, work well, but if you go to any serious event, no one will be riding one.

segmondy6mo ago

they are keeping up. i have been using just chinese models for the last 2 years. chatgpt/gemini/claude have marketing. there's nothing that you can do with those models that can't be done with deepseek, glm or kimi. if there is, do let us know.

1 more reply

MSFT_Edging6mo ago

The downside of their efficiency and cost-ratio is that they undermine the circular economy of massive data centers, GPU sales, and VC money that is constructing an extremely wasteful bubble.

1 more reply

rasz6mo ago

Have you tried using those models? qwen for example cant even do something as basic as clustering analysis on a list of integers, hell it goes off the rails when just reading said integers from a file - starts babbling about determining number of digits, indexes, tries concatenating numbers together into one big string, no idea wtf is going on with that model.

NSPG9116mo ago

way too early to say that

while qwen, deepseek and kimi are opensourced and good, they are preferred because of their insane token ratio, they use a lot less for more, but a by product is that they are less accurate it is amazing progress by the chinese companies, but they definitely can improve a lot more

hunglee26mo ago

too early to call a winner, though it is disappointing to see US withdrawal from open source. Still the main outcome of open source is distribution / diffusion of the idea, so it will inevitably mean US open source will come back, hopefully via some grass roots maniac, there will be a Linus-like character emerge at some point

segmondy6mo ago

i'm not calling a winner, i'm just saying that the chinese have caught up despite the embargo. google, openai & anthrophic have phenomenal models. i stopped using openai & anthropic after they called for open weight/source regulation. i use google because they offer gemma and i got a year gemini-pro subscription for free, use openai gpt-oss-120b since i can run it at home, and the only model i currently pay for is a chinese model.

mixologist6mo ago

user growth has slowed. the technology that should help users is only being pushed from the top, while users refuse to use it. openai pivoted to porn.

does it really feel like they have a chance to recover all the expenses in the future?

crypto grifters pivoted to ai and, same as last time, normal people don’t want to have anything to do with them.

considering the amount of money burned on this garbage, i think we can at least declare a looser.

rzerowan6mo ago

Fingers crossed for convergence rather than divergence in the technical standards.Although the way hings are going it looks like the 2 stacks will diverge sooner rather than later , with the US+ banning the use of CHN models while simultaneosly banning the export of it quasi-open models. We may very well end up in a situation like the old PAL vs NTSC video standard where the PAL(EU/Asia/AFrica) and NTSC(America's/Japan) gradually converged with the adoption of digital formats. Instead here would be a divergence based on geopolitical considerations.

hunglee26mo ago

positive take: a bifurcated tech tree might give us (humanity) a better chance of faster advancement, as it would be a persistent A/B test in live environment. Where I would join you in the crossing of fingers is to ensure such A/B testing is competitive but not destructive. We may even evolve to a situation of complementarity, an American Ying vs the Chinese Yang. Lets hope so!

myth_drannon6mo ago

China's innovation relies on the stolen western IP, without it, China is nothing. Also USSR/Russia is no longer a scientific powerhouse that can supply China with some military innovation. A dictatorship combined with cheap labour it 100% guarantees that the country's innovation is stunted, no matter what the Chinese propaganda claims.

DiogenesKynikos6mo ago

Nowadays, China produces about the same amount of high-quality R&D as the US does.

Your view of China is several decades out of date. Chinese labor isn't even that cheap any more. China is moving up the value chain and outsourcing production that needs cheap labor to poorer countries (or replacing workers with robots altogether).

lossolo6mo ago

Every single sentence you wrote is untrue and can be disproven by empirical evidence. You can learn about it here[1][2].

1. https://itif.org/publications/2024/09/16/china-is-rapidly-be...

2. https://www.economist.com/science-and-technology/2024/06/12/...

Hikikomori6mo ago

And the US has never stolen IP?

thelastgallon6mo ago

No, they haven't! https://apnews.com/general-news-b40414d22f2248428ce11ff36b88...

myth_drannon6mo ago

Corporate espionage is ever present but it is criminalized. The only time US as a country did that you can say "stole IP" was after WII when it took Nazi rocket scientists and technology. China is the opposite; stealing tech is done by the state apparatus (same was done by USSR and reverse engineering computers for example).

Frankly I'm not surprised that this is done, probably if US was so behind it would have done the same to reduce the gap. Everyone is trying to survive and outsmart and outwit the other, instead of collaborating.

3 more replies

archerx6mo ago

I want China to release GPUs with a ton of VRAM, 128gb - 256gb. It doesn’t matter if they are half as fast as Nvidia because having a big model at a reasonable speed is better than not being to run them at all. AMD could have done this and have had a massive impact on nvidia’s market share but they choose not to because reasons.

sspiff6mo ago

Their are signs that China is not open sourcing their SOTA models anymore. Both Huawei and Qwen (Qwen-Max, WAN 2.5) and have launched flagship models which are yet to be opensourced.

natrys6mo ago

Qwen's max series had always been closed weight, it's not a policy change like you are alluding.

What exactly is Huawei's flagship series anyway? Because their PanGu line is open-weight, but Huawei is as of yet not in the LLM making business, their models are only meant to signal that it's possible to do training and inference on their hardware, that's all. No one actually uses those models.

camel_Snake6mo ago

Small counterpoint but there are also 2 new players putting out SOTA open source models (Moonshots Kimi and zhipus GLM) so we're still seeing the same number of models overall, just via newer entrants.

narrator6mo ago

Peaceful competition is a good thing. It's better than a unified one world government throttling everybody.

belter6mo ago

China is a nation of engineers...The US has been relying in on H-1B immigrants. Science is under attack. The truth is the US already lost: https://youtu.be/whVlI6H4d-4

knowitnone36mo ago

It's much easier to copy what others are doing instead of spending the time and money for research and engineering. It's also much easier if you steal the tech. I could never have invented a bicycle but I can sure make a copy of one.

FpUser6mo ago

"... instead of spending the time and money for research and engineering..."

China has plenty of R&D and science now.

downrightmike6mo ago

That's how it usually goes, fully expected

coliveira6mo ago

You mean, thank the US for their FAILED "civilizational" gate keeping.

amelius6mo ago

Another outcome may be that we now have to learn Chinese to understand their datasheets ...

anonzzzies6mo ago

I was doing this in the 70-80s with electronics from Hong Kong and Japan. The nice cheap stuff ( I was very young ) was all sheets in things I basically had to pattern match against notes of others on BBS and meetups.

IT4MD6mo ago

I believe this is an Pollyanna take on AI. There is nothing about humans that tells us humans will bring AI to fruition for the other humans and a mountain of evidence showing how it will be used to abuse humans instead....for profits/power/whatever horse shit the masters of the universe have decided upon.

braza6mo ago

Does someone know if there's some equivalent of those engineering/research blogs for Chinese companies?

I used to follow the ones from Western companies, but honestly, after some point in time, I would like to see some cases from what I consider is a good benchmark for everyone that does not work in FAANG in terms of engineering.

supriyo-biswas6mo ago

The company blogs of Chinese companies will often do articles like this[1] talking about a new innovation or optimization that they did, but this will be often just mixed in with marketing articles too.

I would also assume there's a lot of content in the native Chinese forums, which unfortunately, as an English-speaking person, I wouldn't be able to easily refer to :(

[1] https://www.alibabacloud.com/blog/how-does-alibaba-ensure-th...

ddelnano6mo ago

Does anyone know how their KV cache sync mechanism compares to newer P2P communication layers like nixl, uccl p2p, etc.?

The authors mention that NCCL and Ray initialization were too slow (see quote below), but from the description it sounds like they’ve reimplemented a layer that’s increasingly being standardized by frameworks like nixl and uccl.

> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.

checker6596mo ago

They are working with tiny models. Not sure how well it'd scale to bigger models (if at all).

CaptainOfCoit6mo ago

They're all LLMs, so no, not tiny, but not exactly huge either:

> Our current deployment runs in a cross-region cluster comprising 213 H20 GPUs, serving twenty-eight 1.8–7B models (TP=1) and nineteen 32–72B models (TP=4).

jeffybefffy5196mo ago

I still think nVidia has the most to loose in the AI race, optimisations like this will continue coupled with better ASIC's.

ibejoeb6mo ago

Sounds like this virtual GPU is a separate scheduler. I wonder what kind of latency is introduced by marshaling all that data around.

catigula6mo ago

Sounds like they stopped doing something stupid.

shoeb00m6mo ago

Would this make cloud providers running low volume fine-tuned models more economically viable?

lnxg33k16mo ago

Lots of shareholders here, move along, there is nothing to read

throwaway484766mo ago

Its easy enough for a a well resourced entity to take a pre trained model and deploy it on new hardware to save on the NVDA tax. It's far less likely for research and model training to happen outside the mature NVDA ecosystem.

mighmi6mo ago

To what extent is this practice applicable to other loads?

wsfung20086mo ago

This is for platforms that serve many different models, most of which have very low usage. e.g. huggingface, civitai

wslh6mo ago

How feasible is that in an horizon of 5 years new optimized "equations" will cut the need for more GPUs?

aoeusnth16mo ago

Not feasible.

nickysielicki6mo ago

> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.

I mean, it really shouldn't take tens of seconds for those initialization(s) to occur. There's no good fundamental reason that it should take that long. It's just bloat.

t0lo6mo ago

Is this another nail in the gpu/ai stock market bubble coffin?

j / k navigate · click thread line to collapse

315 comments

kilotaras6mo ago

Alibaba Cloud claims to reduce Nvidia GPU used for serving unpopular models by 82% (emphasis mine)

> 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found

Instead of 1192 GPUs they now use 213 for serving those requests.

bee_rider6mo ago

I’m slightly confuse as to how all this works. Do the GPUs just sit there with the models on them when the models are not in use?

(I’m not an AI guy, though—actually I’m used to asking SLURM for new nodes with every run I do!)

miki1232116mo ago

Loading a model takes at least a few seconds, usually more, depending on model size, disk / network speed and a bunch of other factors.

If you're using an efficient inference engine like VLLM, you're adding compilation into the mix, and not all of that is fully cached yet.

If that kind of latency isn't acceptable to you, you have to keep the models loaded.

This (along with batching) is why large local models are a dumb and wasteful idea if you're not serving them at enterprise scale.

cnr6mo ago

Let's say, then, that it's not so much "dumb and wasteful" as "energy inefficient". In fact, this can be quite wise in a modern world full of surveillance-as-a-business and "us-east-1 disasters"

cnr6mo ago

Can you elaborate the last statement? Don't quite understand why loading local LLM to GPU RAM, using it for the job and then "ejecting" is "dumb and wasteful" idea?

2 more replies

behnamoh6mo ago

> This (along with batching) is why large local models are a dumb and wasteful idea if you're not serving them at enterprise scale.

Local models are never a dumb idea. The only time it's dumb to use them in an enterprise is if the infra is Mac Studio with M3 Ultra because pp time is terrible.

svachalek6mo ago

jychang6mo ago

They definitely won't idle out- if they idle out, it'll take on the order of up to 60 seconds to load the model back into VRAM, depending on the model.

That's an eternity for a request. I highly doubt they will timeout any model they serve.

5 more replies

andy_ppp6mo ago

How does this work with anything but trivially small context sizes!?

1 more reply

smallnix6mo ago

> I guess I’d assumed this sort of thing would be allocated dynamically

yorwba6mo ago

citizenpaul6mo ago

>Do the GPUs just sit there with the models on them when the models are not in use

make36mo ago

the models are huge, so not a single (latest gen) one can fit on a single GPU.

It's likely that these are small unpopular (non flagship) models, or that they only pack eg one layer of each model.

svachalek6mo ago

Per the very short article, the solution was to pack multiple models per GPU.

1 more reply

hinkley6mo ago

So 82% of 17.7%?

14.5% is worth a raise at least. But it’s still misleading.

abejfehr6mo ago

I don't think that's what this is saying, isn't it that 100 - ~82 = 17.7% ?

hinkley6mo ago

That is a confusing coincidence, but no.

> Reserving full GPU instances for these models leads to allocating 17.7% of our GPUs to serve only 1.35% of requests

> Deployment results show that Aegaeon reduces the number of GPUs required for serving these models from 1,192 to 213, highlighting an 82% GPU resource saving.

yorwba6mo ago

somerandomdude26mo ago

Really:

"A paper presented at SOSP 2025 details how token-level scheduling helped one GPU serve multiple LLMs, reducing demand from 1,192 to 213 H20s."

Which, if you scale it, matches the GPs statement.

yorwba6mo ago

xor11016mo ago

Doesnt sound right

MangoCoffee6mo ago

In the past, software and computer engineers would tackle problems head-on, designing algorithms and finding creative solutions.

_heimdall6mo ago

coliveira6mo ago

If AI is a threat to other nations, why is anyone even supporting this? Are we really trying to annihilate the planet as quickly as possible?

1 more reply

djoldman6mo ago

Key paragraph:

make36mo ago

these other models are likely much smaller

majke6mo ago

better link https://www.tomshardware.com/tech-industry/semiconductors/al...

paper https://dl.acm.org/doi/10.1145/3731569.3764815

dang6mo ago

Ok, we've changed the URL above (from https://www.scmp.com/business/article/3329450/alibaba-cloud-...), and will put the link to the paper in the top text. Thanks!

hd4OP6mo ago

Feel like I've made it

hunglee26mo ago

dlisboa6mo ago

History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

---

achierius6mo ago

dlisboa6mo ago

1 more reply

hshdhdhj44446mo ago

lurk26mo ago

Massive vector for theft of trade secrets and intellectual property.

There is a lesson to be learned there, but it’s contrary to the argument you are trying to make.

deadbabe6mo ago

1 more reply

ahmeneeroe-v26mo ago

Is that the best way? China's rise had already happened by the 2010s

Preventing that could have been prevented in the 70s, 80s, 90s by stopping offshoring, blocking student visas, and prosecuting IP theft.

3 more replies

huntertwo6mo ago

The whole “China copies everything” narrative is becoming less and less true.

ehnto6mo ago

Enterprises often prefer having US based support and so can prefer US or European machines that have that supply chain setup.

xbar6mo ago

It is less exclusively true.

onlyrealcuzzo6mo ago

> History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

It's worked for a very long time for aircraft.

If for no technical reasons and purely political, COMAC may still be decades away from being able to fly to most of the world.

4 more replies

zawaideh6mo ago

3 more replies

heavyset_go6mo ago

sokoloff6mo ago

Despite ample and repeated evidence that they can and, in China’s case, that they’re the best in the world in several areas of manufacturing.

hopelite6mo ago

rayiner6mo ago

Nobody thinks the Japanese aren’t “civilized.” “Western” is just a euphemism for “rich and orderly.”

switchbak6mo ago

It is an odd category, and Japan is often considered to be "Western" - these days at least. That certainly wasn't the case even a few generations ago.

I think it's ostensibly supposed to be more about shared cultural values, but even that is a pretty weak way to divide countries. Perhaps "an ally of the United States" is a little more accurate?

Any societal dividing line like this is bound to hit on problems once subjected to the real world.

1 more reply

bad_haircut726mo ago

Its more about democracy and adhering to the global (set up by America post WW2) system of laws and trade.

1 more reply

corimaith6mo ago

>History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

raincole6mo ago

> look at the names

thesmtsolver6mo ago

Another equivalent way to look at that:

Historically, top scientists/researchers/engineers/businessmen migrate from rest of the world to the US rather than to Europe or China.

Imagine if Europe or China were a bit more open with immigration and equally attractive, we would see the same pattern there too.

caycep6mo ago

this is true for anyone - create challenges, and you optimize efficiency elsewhere.

JuniperMesos6mo ago

tsunamifury6mo ago

Concepts that enable the individual should empower a chosen configuration of society not the other way around.

Contrast this with non westernism where either education of the individual is not valued or the state is the primary goal over the individual.

It’s an arrow towards rationalism over tradition, individualism over collectivism, flatness over hierarchy, and future over past. But only the limit of the resources any given society has.

lawlessone6mo ago

In a way withholding a tech becomes a signal saying "Hey this is important" so the result is China dedicates more resources to researching it lol.

notepad0x906mo ago

tsunamifury6mo ago

False.

Westernism is broadly an extension of the academic notion of classicism, starting in Egypt and then Greece Rome and into Europe and the Americas.

1 more reply

MSFT_Edging6mo ago

I think at this point we should be honest with ourselves in it's usage. 90% of the time it's a racist dog whistle.

andsoitis6mo ago

> Yet they're not white enough to be part of "The West".

In many contexts, Mexico and other LatAm countries are included in the Western Civilization grouping. For instance: https://worldpopulationreview.com/country-rankings/western-c...

skinnymuch6mo ago

Mexico isn’t as “western” as the west. Mexico isn’t the first world. It’s not part of the exploiting Global North.

The stuff you bring up ignore the power dynamics which are arguably the most important part.

lazide6mo ago

But why would anyone want to do that?

You do realize that antagonizing people with nuclear weapons and the largest economies in the world rarely results in positive results, right?

1 more reply

tw046mo ago

> History has shown that withholding technology from China does not significantly stop them and they'll achieve it (or better) in a small number of years.

Really? How long has China been attempting to build their own jet engines? How long have they been attempting to build competitive CPUs?

History has shown withholding tech successfully keeps them at least a generation behind the west.

slaw6mo ago

Jet engine since at least 2011. CPU since 2000

https://en.wikipedia.org/wiki/ACAE_CJ-1000A

https://en.wikipedia.org/wiki/Semiconductor_Manufacturing_In...

1 more reply

nextworddev6mo ago

Name one thing China has invented first in LLMs that the “west” adopted as a standard

nextworddev6mo ago

Your silence is deafening, qwen bots

chuckadams6mo ago

hbarka6mo ago

Disparage or exalt? It can also be used in an objective sense without conjuring insult.

1 more reply

notepad0x906mo ago

I think anti-immigrant rhetoric will have the most impact against the US. A lot of the people innovating on this stuff are being maligned and leaving in droves.

But, China's greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.

dlisboa6mo ago

> But, China's greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.

3 more replies

onetimeusename6mo ago

bkandel6mo ago

China's greatest weakness is that their working-age population has already peaked and is in the process of plummeting, which will continue over the coming decades.

2 more replies

hollerith6mo ago

>But now the US is trying to be xenophobic like China, restrict tech import/export like China but compete against 10x population and lack of similar levels of internal strife and fissures.

Do I infer correctly that you believe that China has less internal strife and fissures than the US has?

notepad0x906mo ago

By perception of the population at least, yes. I mean, the US is literally on the verge of a civil war lol.

3 more replies

csomar6mo ago

> But, China's greatest weakness is their lack of ambition and focus on regional matters like Taiwan and south china sea, instead of winning over western europe and india.

China is probably not interested in continuing the current Western-style order but to implement their own sino-stuff. At least with the CCP at the helm.

notepad0x906mo ago

rayiner6mo ago

> But now the US is trying to … compete against 10x population and lack of similar levels of internal strife and fissures.

I can’t tell whether you think the anti-immigration stance is a good thing or bad thing.

notepad0x906mo ago

it's bad for the US, because China has 10x population. the US can't make up in quality, what it lacks in quantity without immigration and attracting foreigners.

3 more replies

marknutter6mo ago

Nobody is anti-immigrant outside of a small pocket of anti-H1B folks in the tech community. People are, however, anti-illegal-immigrant, which is completely different.

notepad0x906mo ago

A lot of actual racists nazis hijack the anti-illegal-immigrant sentiment. I get that it's illegal, and every US administration has enforced it strictly (despite popular rhetoric).

It's not like illegal immigrants are taking skilled work americans could be doing. And let's be honest, even without illegal immigrants, a lot of unskilled work will be replaced by AI/automation.

coliveira6mo ago

lesuorac6mo ago

The US isn't slowing China anymore.

China has an import ban on chips [1] so its irrelevant what the US does.

[1]: https://www.cnbc.com/2025/09/17/nvidia-ceo-disappointed-afte...

overfeed6mo ago

> China has an import ban on chips

xadhominemx6mo ago

1 more reply

unethical_ban6mo ago

Would they have done that if the US had been more "reliable" in providing the chips and didn't cut them off in the first place?

The point still stands that the US instigated the split.

reliabilityguy6mo ago

dataviz10006mo ago

I've been in Chile, Peru, Colombia, Panama, and Costa Rica.

1 more reply

throwaway20376mo ago

    > US cars being completely not competitive in non-US markets

This is definitely untrue in Canada and large parts of LATAM. American cars are all over those places, along with Japanese cars.

tsunamifury6mo ago

The premature optimizer is never the innovator.

Japan eventually stopped that role and their products improved greatly.

1 more reply

segmondy6mo ago

may backfire? it's a bit too late for that.

go to 2024, western labs were crushing it.

Workaccount26mo ago

But they aren't keeping up

They are lauded for the ability to cost ratio, or their ability to parameter ratio, but virtually everyone using LLMs for productive work are using ChatGPT/Gemini/Claude.

They are kind of like Huffy bicycles. Good value, work well, but if you go to any serious event, no one will be riding one.

segmondy6mo ago

1 more reply

MSFT_Edging6mo ago

The downside of their efficiency and cost-ratio is that they undermine the circular economy of massive data centers, GPU sales, and VC money that is constructing an extremely wasteful bubble.

1 more reply

rasz6mo ago

NSPG9116mo ago

way too early to say that

hunglee26mo ago

segmondy6mo ago

mixologist6mo ago

user growth has slowed. the technology that should help users is only being pushed from the top, while users refuse to use it. openai pivoted to porn.

does it really feel like they have a chance to recover all the expenses in the future?

crypto grifters pivoted to ai and, same as last time, normal people don’t want to have anything to do with them.

considering the amount of money burned on this garbage, i think we can at least declare a looser.

rzerowan6mo ago

hunglee26mo ago

myth_drannon6mo ago

DiogenesKynikos6mo ago

Nowadays, China produces about the same amount of high-quality R&D as the US does.

lossolo6mo ago

Every single sentence you wrote is untrue and can be disproven by empirical evidence. You can learn about it here[1][2].

1. https://itif.org/publications/2024/09/16/china-is-rapidly-be...

2. https://www.economist.com/science-and-technology/2024/06/12/...

Hikikomori6mo ago

And the US has never stolen IP?

thelastgallon6mo ago

No, they haven't! https://apnews.com/general-news-b40414d22f2248428ce11ff36b88...

myth_drannon6mo ago

3 more replies

archerx6mo ago

sspiff6mo ago

Their are signs that China is not open sourcing their SOTA models anymore. Both Huawei and Qwen (Qwen-Max, WAN 2.5) and have launched flagship models which are yet to be opensourced.

natrys6mo ago

Qwen's max series had always been closed weight, it's not a policy change like you are alluding.

camel_Snake6mo ago

narrator6mo ago

Peaceful competition is a good thing. It's better than a unified one world government throttling everybody.

belter6mo ago

China is a nation of engineers...The US has been relying in on H-1B immigrants. Science is under attack. The truth is the US already lost: https://youtu.be/whVlI6H4d-4

knowitnone36mo ago

FpUser6mo ago

"... instead of spending the time and money for research and engineering..."

China has plenty of R&D and science now.

downrightmike6mo ago

That's how it usually goes, fully expected

coliveira6mo ago

You mean, thank the US for their FAILED "civilizational" gate keeping.

amelius6mo ago

Another outcome may be that we now have to learn Chinese to understand their datasheets ...

anonzzzies6mo ago

IT4MD6mo ago

braza6mo ago

Does someone know if there's some equivalent of those engineering/research blogs for Chinese companies?

supriyo-biswas6mo ago

I would also assume there's a lot of content in the native Chinese forums, which unfortunately, as an English-speaking person, I wouldn't be able to easily refer to :(

[1] https://www.alibabacloud.com/blog/how-does-alibaba-ensure-th...

ddelnano6mo ago

Does anyone know how their KV cache sync mechanism compares to newer P2P communication layers like nixl, uccl p2p, etc.?

> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.

checker6596mo ago

They are working with tiny models. Not sure how well it'd scale to bigger models (if at all).

CaptainOfCoit6mo ago

They're all LLMs, so no, not tiny, but not exactly huge either:

> Our current deployment runs in a cross-region cluster comprising 213 H20 GPUs, serving twenty-eight 1.8–7B models (TP=1) and nineteen 32–72B models (TP=4).

jeffybefffy5196mo ago

I still think nVidia has the most to loose in the AI race, optimisations like this will continue coupled with better ASIC's.

ibejoeb6mo ago

Sounds like this virtual GPU is a separate scheduler. I wonder what kind of latency is introduced by marshaling all that data around.

catigula6mo ago

Sounds like they stopped doing something stupid.

shoeb00m6mo ago

Would this make cloud providers running low volume fine-tuned models more economically viable?

lnxg33k16mo ago

Lots of shareholders here, move along, there is nothing to read

throwaway484766mo ago

mighmi6mo ago

To what extent is this practice applicable to other loads?

wsfung20086mo ago

This is for platforms that serve many different models, most of which have very low usage. e.g. huggingface, civitai

wslh6mo ago

How feasible is that in an horizon of 5 years new optimized "equations" will cut the need for more GPUs?

aoeusnth16mo ago

Not feasible.

nickysielicki6mo ago

> Distributed executor: Inference engines support model parallelism via distributed executors (e.g., Ray [32] and NCCL [9]), whose initialization takes tens of seconds.

I mean, it really shouldn't take tens of seconds for those initialization(s) to occur. There's no good fundamental reason that it should take that long. It's just bloat.

t0lo6mo ago

Is this another nail in the gpu/ai stock market bubble coffin?

j / k navigate · click thread line to collapse