Nvidia Unveils Grace: A High-Performance Arm CPU for Use in Big AI Systems (opens in new tab)

(anandtech.com)

324 pointshaakon5y ago203 comments

203 comments

149 comments · 28 top-level

lprd5y ago· 31 in thread

So is ARM the future at this point? After seeing how well Apple's M1 performed against a traditional AMD/Intel CPU, it has me wondering. I used to think that ARM was really only suited for smaller devices.

mhh__5y ago

The next decade is ARM's for the taking, but if Intel and AMD can make good cores then it's not anywhere close to slam dunk.

One of the reasons why M1 is good is pure and simple that it has a pretty enormous transistor budget, not solely because it's ARM.

api5y ago

Being ARM has something to do with it. The x86 instruction decoder may be only about ~5% of the die, but it's 5% of the die that has to run all the time. Think about how warm your CPU gets when you run e.g. heavy FPU loads and then imagine that's happening all the time. You can see the power difference right there.

It's also very hard to achieve more than 4X parallelism (though I think Ice Lake got 6X at some additional cost) in decode, making instruction level parallelism harder. X86's hack to get around this is SMT/hyperthreading to keep the core fed with 2X instruction streams, but that adds a lot more complexity and is a security minefield.

Last but not least: ARM's looser default memory model allows for more read/write reordering and a simpler cache.

ARM has a distinct simplicity and low-overhead advantage over X86/X64.

pbsd5y ago

The x86 decoder is not running all the time; the uops cache and the LSD exist precisely to avoid this. With instructions fed from the decoders you can only sustain 4 instructions per cycle, while to get to 5 or 6 your instructions need to be coming from either the uops cache or the LSD. In the case of the Zen 3, the cache can deliver 8 uops per cycle to the pipeline (but the overall thoughput is limited elsewhere at 6)!

Furthermore, the high-performance ARM designs, starting with the Cortex-A77, started using the same trick---the 6-wide execution happens only when instructions are being fed from the decoded macro-op cache.

3 more replies

NortySpock5y ago

> x86 instruction decoder may be only about ~5% of the die

What percent of the die is an ARM instruction decoder?

1 more reply

mhh__5y ago

This is why I said it's ARM's for the taking.

I'm not familiar with how ARM's memory model effects the cache design - Source?

tambourine_man5y ago

>…is pure and simple that it has a pretty enormous transistor budget

There's a lot of brute force, yes, but it's not the only reason. There are lots of smart design decisions as well.

mhh__5y ago

"One of the reasons" I did say.

1 more reply

amelius5y ago

Yes, but those decisions optimize for the single user laptop case, not for e.g. servers.

phendrenad25y ago

It really comes down to how well they can emulate X86. People aren't going to give up access to 3 decades of Windows software.

pjerem5y ago

I'm sure ARM already took over x86 if you have a wider definition of personal computers. And a lot of people already gave up access to 3 decades of Windows software by using their phone or tablet as their main device.

Plus, most of the last decade software is software that runs on some sort of VM or another (be it JVM, CLR, a Javascript engine or even LLVM).

Soon (in years), x86 will only be needed by professionals that are tied to really old software. And those particular needs will probably be satisfied by decent emulation.

2 more replies

ravi-delia5y ago

I've seen things like this a lot, and it's a bit confusing. If parts of the M1's performance are due to throwing compute at the problem, why hasn't Intel been doing that for years? What about ARM, or the M1, allowed this to happen?

NathanielK5y ago

Intel has. Many M1 design choices are fairly typical for desktop x86 chips, but unheard of with ARM.

For example, the M1 has 128 bit wide memory. This has been standard for decades on the desktop(dual channel), but unheard of in cellphones. The M1 also has similar amounts of cache to the new AMD and Intel chips, but thats several times more than the latest snapdragon. Qualcomm also doesn't just design for the latest node. Most of their volume is on cheaper, less dense nodes.

1 more reply

dpatterbee5y ago

Buying the majority of TSMC's 5nm process output helped. It's a combination of good engineering, the most advanced process, and intel shitting themselves I would say.

jayd165y ago

Another reason is the something like 150% memory bandwidth and I'm sure there are other simple wins along those lines.

The M1 isn't necessarily a win for Arm in general. Other manufacturers weren't competing before and its yet to be seen if they will.

mhh__5y ago

It's the memory stupid!

1 more reply

NathanielK5y ago

150% compared to what?

1 more reply

kllrnohj5y ago

It will come down entirely to who can sustain a good CPU core.

Currently Apple is the only company making performance-competitive ARM cores that can make a reasonable justification for an architecture switch.

Otherwise AMD's CPUs are still ahead of everyone else, including all other ARM CPU cores not made by Apple. And even Intel is still faster in places where performance matters more than power efficiency (eg, desktop & PC gaming)

floatboth5y ago

Arm's Neoverse cores are doing pretty well in the datacenter space — on AWS, the Graviton2 instances are currently the best ones for lots of use cases. It's clear that core designs by Arm are really good. The problem currently is the lag between the design being done and various vendors' chips incorporating it.

upd: oh also in the HPC world, Fujitsu with the A64FX seems to be like the best thing ever now

kllrnohj5y ago

Graviton2 is competitive sometimes with Epyc, but also falls far behind in some tests (eg, Java performance is a bloodbath). Overall across majority tests, Neoverse consistently comes up short of Milan even when Neoverse is given a core-count advantage. And critically the per-core performance of Graviton2 / Neoverse is worse, and per-core performance is what matters to consumer space.

But it can't just be competitive it needs to be significantly better in order for the consumer space to care. Nobody is going to run Windows on ARM just to get equivalent performance to Windows on X86, especially not when that means most apps will be worse. That's what's really impressive about the M1, and so far is very unique to Apple's ARM cpus.

> oh also in the HPC world, Fujitsu with the A64FX seems to be like the best thing ever now

A64FX doesn't appear to be a particularly good CPU core, rather it's a SIMD powerhouse. It's the AVX-512 problem - when you can use it, it can be great. But you mostly can't, so it's mostly dead weight. Obviously in HPC space this is different scenario entirely, but that's not going to translate to consumer space at all (and it's not an ARM advantage, either - 512bit SIMD hit consumer space via x86 first with Intel's Rocket Lake).

1 more reply

rubatuga5y ago

Fujitsu flying under the radar while having the fastest cpu ever made haha

huac5y ago

so then we think about what makes Apple's M1 so good. one hard-to-replicate factor is that they designed their hardware and software together, the ops which MacOS uses often are heavily optimized on chip.

but one factor that you can replicate is colocating memory, CPU, and GPU, the system-on-chip architecture. that's what Nvidia looks to be going after with Grace, and I'm sure they've learned lessons from their integrated designs e.g. Jetson. very excited to see how this plays out!

kllrnohj5y ago

> one hard-to-replicate factor is that they designed their hardware and software together, the ops which MacOS uses often are heavily optimized on chip.

Not really, they are still just using the same ARM ISA as everyone else. The only hardware/software integration magic of the M1 so far seems to be the x86 memory model emulation mode, which others could definitely replicate.

> but one factor that you can replicate is colocating memory, CPU, and GPU, the system-on-chip architecture.

AMD introduced that in the x86 world back in 2013 with their Kavari APU ( https://www.zdnet.com/article/a-closer-look-at-amds-heteroge... ), and it's been fairly typical since then for on-die integrated GPUs on all ISAs.

aeyes5y ago

Amazons ARM chips are performance competitive as well, for many workloads you can expect at least similar performance per core at the same clock speed.

mr_toad5y ago

> I used to think that ARM was really only suited for smaller devices.

The current fastest supercomputer uses ARM.

https://en.wikipedia.org/wiki/Fugaku_(supercomputer)

CalChris5y ago

Apple isn't entering the cloud market. Moreover the M1 isn't a cloud cpu. The M1 SOC emphasizes low latency and performance per watt over throughput.

enos_feedler5y ago

AWS has Mac mini, and is expected to add M1 mini into the mix [1]. I expect Apple to take lots of silicon design into data centers and edge computing. Over time I can see a lot of mobile apps running backend through Apple silicon with a full Apple cloud software stack to provide data management around security and privacy.

1. https://9to5mac.com/2021/02/02/m1-mac-mini-in-the-cloud/

1 more reply

fulafel5y ago

The instruction set doesn't make a significant difference technically, the main things about them are monopolies (patents) tied to ISAs, and sw compatibility.

rvanlaar5y ago

I'm interested in your thoughts on why this doesn't make a significant difference. From what I've read, the M1 has a lot of tricks up its sleeve that are next to impossible on X86. For example ARM instructions can be decoded in parallel.

fulafel5y ago

Instruction decoding is more power efficient on arm, but x86 has solved it as a perf bottleneck, with the trace/uop caches and by doing some speculative work in the decoders. (Parallel decoding is also old hat and not a M1 or ARM land invention, it's trivial with RISC style insn format.). What other tricks do you have in mind?

More broadly, as to why the ISA doesn't make a big difference: The major differences are at the microarchitecture level since OoO processors have such flexible dataflow machinery in them that you can kind of view the frontend as compiler technology. x86 and ARM are decades-old ISAs that have seen a many many rounds of iteration in form of added instructions and even backwards incompatible reboots at the 64-bit transition points so most hinderances have been fixed.

In the olden days ISAs were important because processors were orders of magniture simpler, and instructions were processed as-is very statically (to the point that microarchitectural artifacts like branch delay slots were enshrined in some ISAs). This meant that eg the complexity of individual instructions could a bottleneck to how fast a chip could be clocked. Or in CISC land your ISA might have been so complex that the CPU was a microcoded implementation of the ISA and didn't have any hardwired fast instructions...

bitwize5y ago

> So is ARM the future at this point?

The near future. A few years out, RISC-V is gonna change everything.

dkjaudyeqooe5y ago

ARM is the present, RISC-V is the future and Intel is the past.

The magic of Apple's M1 comes from the engineers who worked on the CPU implementation and the TSMC process.

The architecture has some impact on performance but I think it is simplicity and and ease of implementation that factors most into how well it can perform (as per the RISC idea). In that sense Intel lags for small, fast and efficient processors because their legacy architecture pays a penalty for decoding and translation (into simpler ops) overhead. Eventually designs will abandon ARM for RISC-V for similar reasons as well as financial ones.

Really, today it's a question of who has the best implementation of any given architecture.

modeless5y ago· 16 in thread

I hope they make workstations. I want to see some competition for the eventual Apple Silicon Mac Pro.

titzer5y ago

I think Apple did Arm an unbelievable favor by absolutely trouncing all CPU competitors with the M1. By being so fast, Apple's chip attracts many new languages and compiler backends to Arm that want a piece of that sweet performance pie. Which means that other vendors will want to have arm offerings, and not, e.g. RISCv5.

I have no idea what Apple's plans for the M1 chip are, but if they had manufacturing capacity, they could put oodles of these chips into datacenters and workstations the world over and basically eat the x86 high-performance market. The fact that the chip uses so little power (15W) means they can absolutely cram them into servers where CPUs can easily consume 180W. That means 10x the number of chips for the same power, and not all concentrated in one spot. A lot of very interesting server designs are now possible.

jillesvangurp5y ago

I think you are half right in the sense that people now know Intel architectures are not what they want/need. Riscv5 chipsets will take a bit longer to mature but can in principle do the same kinds of things that Apple is doing with M1 to keep energy usage low and throughput high. However, the key selling feature with RiscV5 is reduced IP licensing needs (cost).

With Nvidia, buying Arm and producing their own chip sets, that's no small advantage for companies that are not Nvidia (or Apple who have a perpetual license already). If I were Intel, that's what I'd be looking at right now. Same for perhaps AMD. The clock is ticking on their x86 only strategy and it takes time to develop new architectures; even if you do license somebody else's instruction set.

A counter argument to this would be software compatibility. Most of the porting effort to make linux, windows, and mac os run on Arm has already happened years ago. It's a mature software ecosystem. Software is actually the hardest part of shipping new hardware architectures. Without that, hardware has no value.

And a counter argument to that is that Apple is showing instruction set emulation actually works reasonably well: it is able to run x86 software at reasonable performance on the M1. So, running natively matters less these days. If you look at Qemu, they have some interesting work going on around e.g. emulated GPU where the goal is not to emulate some existing GPU but to create a virtual only GPU device called Virgil 3D that can run efficiently on just about anything that supports opengl. Don't expect to set fps records of course. The argument here is that the software ecosystem is increasingly easy to adapt to new chip architectures as a lot of stuff does not require access to bare metal. Google uses this strategy with Android: native compilation happens (mostly) just in time after you ship your app to the app store.

klelatti5y ago

It's hard to imagine that until a few months ago it was very difficult to get a decent Arm desktop / laptop. I imagine lots of developers working now to fix outstanding Arm bugs / issues.

giantrobot5y ago

While I'm sure lots of projects have actual ARM-related bugs, there was a whole class of "we didn't expect this platform/arch combination" compilation bugs that have seen fixes lately. It's not that the code has bugs on ARM, a lot of OSS has been compiling on ARM for a decade (or more) thanks to Raspberry Pis, Chromebooks, and Android but built scripts didn't understand "darwin/arm64". Back in December installing stuff on an M1 Mac via Homebrew was a pain but it's gotten significantly easier over the past few months.

But a million (est) new general purpose ARM computers hitting the population certainly affects the prioritizing of ARM issues in a bug tracker.

1 more reply

mhh__5y ago

> compiler backends to Arm that want a piece of that sweet performance pie

How many compilers didn't support ARM?

titzer5y ago

A lot of hobbyist ones, e.g. But even for mainstream compilers, arm has been a second-class citizen where developers would not necessarily test on arm. E.g. I used to work on V8, and we had partners at ARM who would help support the 32- and 64-bit ports. While I often did go ahead and port my changes to arm, it wasn't always required, as they could do heavy lifting and debugging for us, sometimes. We didn't have arm hardware on our desks to test; V8 literally has its own CPU simulators built into it, just for running the generated code from its own JITs. We had good regression testing infrastructure, but there is nothing quite like having first-class, on-desk hardware to test with, preferrably to develop directly on.

dhruvdh5y ago

They are licensing ARM cores; which as of now cannot compete with Apple silicon.

While there are using some future ARM core, and I've read rumors that future designs might try to emulate what has made Apple cores successful; we cannot say whether Apple designs will stagnate or continue to improve at current rate.

There is potential for competition from Qualcomm after their Nuvia acquisition though.

ac295y ago

Maybe not in single threaded performance, but Apple has no server grade parts. Ampere, for example, is shipping an 80 core ARM N1 processor that puts out some truely impressive multithreaded performance. An M1 Mac is an entirely different market - making a fast 4+4 core laptop processor doesn't neccesarily translate into making a fast 64+ core server processor.

martinald5y ago

To be honest it does though. You could take 10 M1 chips (40+40 cores, with around 30TFLOPS of GPU) put them into a server and even at full load you would be at 150W, which is about half of the high core count Xeons. Obviously not as simple as that, but the thermal fundamentals are right.

The 40 core Xeon also costs around 10k.

There's rumors that the new iMac will have a 20 core M1 (16+4). I imagine that will be faster than even the top line $10k Xeon.

I have absolutely no doubt apple could put together a server based on the M1 which would wipe the floor with Intel if they wanted to. But I very much doubt they will since it is so far out of their core competencies these days.

I have absolutely no doubt apple could produce a ridiculously good server CPU from the M1. I doubt they will actually do it though.

2 more replies

devmor5y ago

What do you mean ARM cores can't compete with Apple silicon? "Apple silicon" are ARM cores.

mlyle5y ago

He means cores made by ARM, not cores implementing the ARM ISA. Currently, the cores designed by ARM cannot touch the Apple M1.

dharmab5y ago

Apple Silicon is compatible with the ARM instruction set but they are not "just ARM cores" in their internal design.

adgjlsfhk15y ago

It seems weird to me to say that arm cores can't compete with apple silicon given that apple doesn't own fabs. They are using arm cores on TSMC silicon (exactly the same as this).

seabrookmx5y ago

> They are using arm cores on TSMC silicon (exactly the same as this)

No the Apple Silicon chips use the arm _instruction set_ but they do not use their core design. Apple designs their core in house, much like Qualcomm does with snapdragon. Both of these companies have an architectural license which allows them to do this.

1 more reply

macksd5y ago

You probably mean less powerful than this, but they do: https://www.nvidia.com/en-us/deep-learning-ai/solutions/work....

modeless5y ago

Yes they make workstations, but they don't make ARM workstations. Yet. They already have ARM chips they could use for it, but they went with x86 instead despite the fact that they have to purchase the x86 chips from their direct competitor. Also, yes, less than $100k starting price would be nice.

titzer5y ago· 10 in thread

Given that there are essentially no architectural details here other than bandwidth estimates, and the release timeline is in 2023, how exactly does this count as "unveiling"? Headline should read: "NVidia working on new arm chip due in two years", or something else much more bland.

mrlento2345y ago

Not quite. CSCS supercomputing center in Switzerland have already started receiving the hardware (https://www.cscs.ch/science/computer-science-hpc/2021/cscs-d...). Perhaps, we may see some benchmarks. To wider HPC users, it will be only available in 2023 as the article mentioned.

bdc-hpc5y ago

The Alps system at CSCS will have racks with different processors, to be installed in phases. CSCS has taken delivery of the first racks with AMD EPYC processors, for non-GPU workloads. CSCS will be one of the first customers to get their hands on Grace Hopper, but they will have to wait until 2023.

wombat235y ago

Are there more sources for technical details about the new infrastructure? The interview linked above left me with more questions than answers.

IanCutress5y ago

I suspect that's more racks of storage, not racks of compute. Nothing to suggest it's compute.

seniorivn5y ago

as i understand it's compute, just not cpu compute, those cpu are designed to be good enough for cuda servers

DetroitThrow5y ago

Hey Ian, I love reading your posts on Anandtech, you're a fantastic technical communicator.

titzer5y ago

Hopefully some architectural details are forthcoming then! But that is not what is in this article.

temp6675y ago

The CPU cores are probably not that interesting, it's going to be the GPU and interlink stuff (pretty impressive if true) that's going to drive this?

kats5y ago

It says they use Arm Neoverse cores so it is another processor like Fujitsu A64FX and Amazon Graviton 2.

allie15y ago

As AMD proved us, a lot can happen in 3 years

gchadwick5y ago· 8 in thread

It'd be interesting to know if NVidia are going for an ARMv9 core, in particular if they'll have a core with an SVE2 implementation.

It may be they don't want to detract from focus on the GPUs for vector computation so prefer a CPU without much vector muscle.

Also interesting that they're picking up an arm core rather than continuing with their own design. Something to do with the potential takeover (the merged company would only want to support so many micro-architectural lines)?

klelatti5y ago

This has got me wondering whether an Nvidia owned Arm could limit SVE2 implementations so as not to compete with Nvidia's GPU. That would certainly be the case for Arm designed cores - not a desirable outcome.

MikeCapone5y ago

I doubt it, it's not like the market for acceleration is stagnant and saturated and they need to steal some marketshare points from one side to help the other.

It's all greenfield and growing so far, they'll win more by having the very best products they can make on both sides.

mlyle5y ago

You'd think. But it wouldn't be the first time a new product is hampered to not slightly theoretically cannibalize an existing product family.

adrian_b5y ago

They have said clearly that the core is licensed from ARM and one of the Neoverse future models.

There was no information whether it will have any good SVE2 implementation. On the contrary they insisted only on the integer performance and on the high-speed memory interface.

gchadwick5y ago

Here's Anandtech's article on the previous Neoverse V1/N2 announcement: https://www.anandtech.com/show/16073/arm-announces-neoverse-... arm weren't saying anything official but Anandtech did a little digging and reckons V1 is SVE 1 and v8 and N2 could be Armv9 with SVE 2.

I'd suspect NVidia would be using the V1 here as it's the higher performing core, but not way to be certain.

dragontamer5y ago

Neoverse V1 has SVE, Neoverse E or N do not.

"E" is efficiency, N is standard, V is high-speed. IIRC, N is the overall winner in performance/watt. Efficiency cores have the lowest clock speed (overall use the least amount of watts/power). V purposefully goes beyond the performance/watt curve for higher per-core compute capabilities

Teongot5y ago

Neoverse-N2 will have SVE2 (source https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aar... )

theonlyklas5y ago

I think they will use SVE2 because I assume they'll need to perform vector reads/writes to NVLink connected peripherals to reach that 900GB/s GPU-to-CPU bandwidth metric they described.

DonHopkins5y ago· 8 in thread

I love the name "Grace", after Grace Hopper.

paulmd5y ago

There's a tendency to use first names to refer to women in professional settings or political power that is somewhat infantilizing and demeaning.

I doubt anyone really deliberately sets out to be like "haha yessss today I shall elide this woman's credentials", but this is one of those unconscious gender-bias things that is commonplace in our society and is probably best to try and make a point of avoiding.

https://news.cornell.edu/stories/2018/07/when-last-comes-fir...

https://metro.co.uk/2018/03/04/referring-to-women-by-their-f...

(etc etc)

I'd prefer they used "Hopper" instead, in the same way they have chosen to refer to previous architectures by the last names of their namesakes (Maxwell, Pascal, Ampere, Volta, Kepler, Fermi, etc). I'd see that as being more professionally respectful for her contributions.

But yes I very much like the idea of naming it after Hopper.

bloak5y ago

Perhaps you're being downvoted because it's a tangent. It's a real phenomenon, though, and an interesting one. Of course there are many things that influence which parts of someone's full name get used, and if the tendency is a problem it's a trivial one compared to all the other problems that women face, but, yes, in general it would probably be a good idea to be more consistent in this respect.

Vaguely related: J. K. Rowling's "real" full name is Joanne Rowling. The publisher "thought a book by an obviously female author might not appeal to the target audience of young boys".

There's another famous (in the UK at least) computer scientist called Hopper: Andy Hopper. So "G.B.M. Hopper", perhaps? That would have more gravitas than "Andy"!

trynumber95y ago

Hopper was already reserved for an Nvidia GPU: https://en.wikipedia.org/wiki/Hopper_(microarchitecture)

paulmd5y ago

Yeah, I dunno what is going on with that, I assumed that had changed if they were going to use the name "grace" for another product.

I guess I'm not sure if "Hopper" refers to the product as a whole (like Tegra) and early leakers misunderstood that, or whether Hopper is the name of the microarchitecture and "Grace" is the product, or if it's changed from Hopper to Grace because they didn't like the name, or what.

Otherwise it's a little awkward to have products named both "grace" and "hopper"...

adrian_b5y ago

I do not believe that referring to women using the first names is somewhat infantilizing and demeaning.

Unfortunately, at least in most Western societies, using the first names is the only way to refer unambiguously to women.

According to the tradition, in most Western countries the women do not have their own family names, but use either the family name of their father until marriage, or the family name of their husband after that.

So while Grace is the computer scientist, Hopper is her husband and Murray is her father. Using the name Grace makes clear who is honored.

Nowadays, in many places there are laws that allow women to choose their family names or to combine the family names.

Nevertheless, the old tradition is still entrenched, so searching for a certain woman, when the last information about her is many years old, can be difficult due to unpredictable family name changes.

Ideally, a human should keep forever the family name used at birth and the parents should choose one of their family names for the children.

ashtonbaker5y ago

So to be clear, "Hopper" would unambiguously refer to Vincent Foster Hopper in this context, and not famed computer scientist Grace Hopper? Not Vincent Foster's father? What if he was adopted and began life with a different family name? Why make this distinction specifically for women, so that a last name cannot possibly refer to them?

pezezin5y ago

   Ideally, a human should keep forever the family name used at birth and the parents should choose one of their family names for the children.

I prefer the Spanish way, have two family names. We have been doing it for centuries, it baffles me that other countries find it so difficult to adopt a similar system.

hderms5y ago

I feel like there's a non-zero chance they named it Grace instead of Hopper so their new architecture doesn't sound like a bug or a frog or something. You could be right, though

Aissen5y ago· 6 in thread

GPU-to-CPU interface >900GB/sec NVLink 4. What kind of interconnect is that ? Is that even physically realistic ?

robomartin5y ago

Well, PCIe 6 x16 will do 128 GB/s. Of course, the real question is how many transactions per second you get. For the PCIe 6 16 lanes it's about 64 GT/s.

Speaking in general terms, data rate and transaction rate don't necessarily match because a transaction might require the transmitter to wait for the receiver to check packet integrity and then issue acknowledgement to the transmitter before a new packet can be sent.

Yet another case, again, speaking in general terms, would be the case of having to insert wait states to deal with memory access or other processor architecture issues.

Simple example, on the STM32 processor you cannot toggle I/O in software at anywhere close to the CPU clock rate due to architectural constraints (to include the instruction set). On a processor running at 48 MHz you can only do a max toggle rate of about 3 MHz (toggle rate = number of state transitions per second).

jabl5y ago

> Speaking in general terms, data rate and transaction rate don't necessarily match because a transaction might require the transmitter to wait for the receiver to check packet integrity and then issue acknowledgement to the transmitter before a new packet can be sent.

PCIe has the optional "relaxed ordering" feature, allowing sending new packets before the ACK has been received from preceeding ones. Not sure precisely how this works, if there is some TCP-like window scaling algorithm in play or not..

rincebrain5y ago

Well, according to [1], NVIDIA lists NVLink 3.0 as being 50 Gb/s per lane per direction, and lists the total maximum bandwidth of NVSwitch for Ampere (using NVLink 3.0) as 900 GB/s each direction, so it doesn't seem completely out of reach.

[1] - https://en.wikipedia.org/wiki/NVLink

Aissen5y ago

With 50Gb/s per lane, that would be 144 lanes to reach 900GB/s. Quite impressive.

rincebrain5y ago

Fascinatingly, NVIDIA's own docs [1] claim GPU<->GPU bandwidth on that device of 600 GB/s (though they claim total aggregate bandwidth of 9.6 TB/s). Which would be what, 96 and 1536 lanes, respectively? That's quite the pinout.

[1] - https://www.nvidia.com/en-us/data-center/nvlink/

freeone30005y ago

Depends on how big you want to make it. If they're willing to go four inches, that'd do it with existing per-pin speeds from NVLink 3.

remexre5y ago· 5 in thread

> Today at GTC 2021 NVIDIA announces its first CPU

Wait, Nvidia's been making ARM CPUs for years now; most memorably Project Denver.

jdsully5y ago

NVIDIA called it their first “data center CPU”. Our helpful reporter simplified it to the point of being flat out wrong. Not uncommon.

justin665y ago

I expected more from a site called VideoCardz.

015a5y ago

Arguably, most memorably, Tegra; the CPU/GPU which powers the Nintendo Switch.

Jasper_5y ago

That uses a licensed ARM Cortex design under the hood.

uxp1005y ago

The Tegra line included Denver and Carmel cores. Tegra was the product line, then the Switch chips have their own names.

api5y ago· 5 in thread

Tangent: Apple should bring back the Xserve with their M1 line, or alternately license the M1 core IP to another company to produce a differently-branded server-oriented chip. The performance of that thing is mind blowing and I don't see how this would compete with or harm their desktop and mobile business.

AnthonyMouse5y ago

The cheapest available Epyc (7313P) has 16 cores and dual socket systems have up to 128 cores and 256 threads. Server workloads are massively parallel, so a 4+4 core M1 would be embarrassed and Apple wouldn't want to subject themselves to that comparison.

But another reason they won't do it is that TSMC has a finite amount of 5nm fab capacity. They can't make more of the chips than they already do.

api5y ago

I'm thinking of a 64-core M1. It would not be the laptop chip.

ac295y ago

A 4+4 core M1 is 16 billion transistors. Some of that is the little cores, GPU, etc, but its not clear to me its practical to get, say 8x larger. That would be 128 billion transistors. As a point of comparison, NVIDIA's RTX 3090 is 28B transistors, and thats a huge, expensive chip.

_ph_5y ago

I am also hoping for a return of the Xserve once Apple makes high-corecount variations of the Apple Silicon for the Mac Pro. This would have several benefits. First of all, it would greatly increase the production count of that variant, it could be too expensive to make such a chip just for the Mac Pro. In any case, it should be cheaper than an equivalent Intel CPU as Apple would not have to pay for Intels profits. And finally, just the power savings for the vast compute centers Apple operates should mean a lot of money saved too.

bombcar5y ago

How much of that performance is on-chip memory and how usable/scalable is that? An Xserve that is limited to one CPU and can't have more RAM would pretty mediocre.

lprd5y ago· 5 in thread

hilios5y ago

Depends, performance wise it should be able to compete with or even outperform x86 in many areas. A big problem until now was cross compatibility regarding peripherals, which complicates running a common OS on ARM chips from different vendors. There is currently a standardization effort (Arm SystemReady SR) that might help with that issue though.

Hamuko5y ago

Based on initial testing, AWS EC2 instances with ARM chips performed as well if not better than the Intel instances, but they cost 20% less. The only drawback that I've really encountered thus far was that it complicates the build process.

moistbar5y ago

Does ARM have a uniquely complex build process, or is it the mix of architectures that makes it more difficult?

sumtechguy5y ago

ARM is all over the place with its ISA. x86 has the benefit that most companies made it 'IBM compatible'. There are one off x86 ISAs but they are mostly forgotten at this point. The ARM CPU family itself is fairly consistent (mostly), but included hardware is a very mixed bag. The x86 has on the other hand the history of build it to make it work like IBM. All the way from how things boot up, memory space addresses, must have I/O, etc. ARM on the other hand may or may not have that depending on which ISA you target or are creating. Things like the raspberry PI has changed some of that as many are mimicking the broadcom ISA and specifically that with the raspberry pi one. The x86 arch has also picked up some interesting baggage along the way because of what it is. We can mostly ignore it but it is there. For example you would not build a ARM board these days with an IDE interface but some of those bits still exist in the x86 world.

ARM is more of a tool kit to build different purpose built computers (you even see them show up in usb sticks). While x86 is particular ISA that has a long history behind it. So you may see something like 'Amazon builds its own ARM computers'. That means they spun their own boards, built their own toolchains (more likely recompiled existing ones), and probably have their own OS distro to match. Each one of those is a fairly large endeavor to do. When you see something like 'Amazon builds its own x86 boards', they have shaved out the other two parts of that and are focusing on hardware. That they are building their own means they see the value in owning the whole stack. Also if you have your own distro means you usually have to 'own' building the whole thing. So I can go grab an x86 gcc stack from my repo provider. They will need to act as the repo owner and build it themselves and keep up with the patches. Depending on what has been added that can be quite the task all by itself.

Hamuko5y ago

Mix of architectures and the fact that our normal CI server is still x86-based and really didn't want to do ARM builds.

ksec5y ago· 4 in thread

Based on Future ARM Neoverse, so basically nothing much to see here from CPU perspective, What really stands out, are those ridiculous number from its Memory system and Interconnect.

CPU: LPDDR5X with ECC Memory at 500+GB/s Memory Bandwidth. ( Something Apple may dip into. R.I.P for Mac with upgradable Memory )

GPU: HBM2e at 2000 GB/s. Yes, three zeros, this is not a typo.

NVLink: 500GB/s

This will surely further solidify CUDA dominance. Not entirely sure how Intel's XE with OneAPI and AMD's ROCm is going to compete.

Dylan168075y ago

> GPU: HBM2e at 2000 GB/s. Yes, three zeros, this is not a typo.

It's a good step forward but your average consumer GPU is already around a quarter to a third of that and a Radeon VII had 1000 GB/s two years ago.

m_mueller5y ago

I think what you’re missing here is the NVLink part. The fact that you can get a small cluster of these linked up like that for 400k, all wrapped in a box, makes HPC quite a bit more accessible. Even 5 years ago, if you wanted to run a regional sized weather model at reasonable resolution, you needed to have some serious funding (say, nation states or oil / insurance companies). Nowadays you could do it with some angel investment and get one of these Nvidia boxes and just program them like they’re one GPU.

3 more replies

jabl5y ago

The Nvidia A100 80GB already provides 2 TB/s mem BW today. Also using HBM2e.

nickflood5y ago

RTX 3090 also does 936 GB/s which is very close to Radeon 7 but with conventional GDDR

alexhutcheson5y ago· 3 in thread

The fact that they are using a Neoverse core licensed from ARM seems to imply that there won’t be another generation for NVidia’s Denver/Carmel microarchitectures. Somewhat of a shame, because those microarchitectures were unorthodox in some ways, and it would have been interesting to see where that line of evolution would have lead.

I believe this leaves Apple, ARM, Fujitsu, and Marvell as the only companies currently designing and selling cores that implement the ARM instruction set. That may drop to 3 in the next generation, since it’s not obvious that Marvell’s ThunderX3 cores are really seeing enough traction to be be worth the non-recurring engineering costs of a custom core. Are there any others?

klelatti5y ago

Designing but not yet selling Qualcomm / Nuvia?

alexhutcheson5y ago

Yeah will be interesting to see if and when they bring a design to market.

intvocoder5y ago

The ThunderX3 team is mostly gone, it's been hollowed out.

valine5y ago· 3 in thread

I like the sound of a non-Apple arm chip for workstations. Given my positive experience with the M1 I'd be perfectly happy never using x86 again after this market niche is filled.

awill5y ago

Me too. But my decades old steam collection isn't looking forward to it. That's one advantage of cloud gaming. It won't matter what your desktop runs on.

webaholic5y ago

I don't think this will be anywhere near as good as the M1, since they are using the ARM Neoverse cores.

ac295y ago

Apple throws a lot of transistors at their 4 performance cores in the M1 to get the performance they do - its not clear that approach would realistically scale to a workstation CPU with 16, 32, or more cores (at least not with current fab capabilities).

cma5y ago· 3 in thread

Real business-class features we want to know about:

Will they auto-detect workloads and cripple performance (like the mining stuff recently)? Only work through special drivers with extra licensing feeds depending on the name of the building it is in (data center vs office)?

rubatuga5y ago

Market segmentation is practiced by every chip company that you use. Intel: ECC. AMD: ROCM. Qualcomm: cost as percentage of the phone price.

cma5y ago

I still think Nvidia takes it further.

volta835y ago

Every company does market segmentation: it makes sense to have customers that want a feature pay more for it.

Still, every company does it differently.

For example, both NVIDIA and AMD compute GPUs are necessarily more expensive than gamer GPUs because of hardware costs (e.g. HBM).

However, NVIDIA gamer GPUs can do CUDA, while AMD gamer GPUs can't do ROCm.

The reason is that NVIDIA has 1 architecture for gaming and compute (Ampere), while AMD has two different architectures (RDNA and CDNA).

1 more reply

rexreed5y ago· 3 in thread

Honestly the bottom down-voted comment has it right. What AI application is actually driving demand here? What can't be accomplished now (or with reasonable expenditures) that can be accomplished by this one CPU that will be released in 2 yrs? What AI applications will need this 2 yrs from now that don't need it now?

I understand the here-and-now AI applications. But this is smelling more like Big AI Hype than Big AI need.

gwern5y ago

Huang said "We expect to see multi-trillion-parameter models by next year, and 100 trillion+ parameter models by 2023". He probably knows more about what AI applications there are than you do, and spends a large chunk of the keynote discussing many applications.

cracker_jacks5y ago

"640K ought to be enough for anybody."

wmf5y ago

GPT-4 and GPT-5.

filereaper5y ago· 2 in thread

Looks like NVidia broke up with POWER on IBM and made their own chip.

They have interconnects from Mellanox, GPUs and their own CPUs now.

I suspect the supercomputing lists will be dominated by NVidia now.

physicsguy5y ago

IBM have basically hollowed out their team, so I'd say it's IBM ditching the market more than anything... our centre would not now consider POWER even though we currently have nodes.

arcanus5y ago

That is certainly the trend. AMD is bringing Frontier online later this year, which might be the only counter to this.

rektide5y ago· 2 in thread

There's a lot of interconnects (CCIX, CXL, OpenCAPI, NVLink, GenZ) brewing. Nvidia going big is, hopefully, a move that will prompt some uptake from the other chip makers. 900GBps link, more than main memory: big numbers there. Side note, I miss AMD being actively involved with interconnects. InfinityFabric seems core to everything they are doing, but back in the HyperTransport days it was something known, that folks could build products for, interoperate with. Not many did, but it's still frustrating seeing AMD keeping cards so much closer to the chest.

rektide5y ago

lot of downvotes. anyone want to say any reason why they think this deserves a downvote? very unclear to me. do you all just not have the historical context? what's wrong here? give me some hints why you don't get what i'm saying here.

pezezin5y ago

That's something weird I have noticed about HN, sometimes perfectly reasonable comments are downvoted to hell without any reply. At least in the good ol' Slashdot days you will get the reason why you got downvoted, now... nothing.

nabla95y ago· 2 in thread

Finally news from Nvidia that really moved markets.

  Nvidia +4.68%, 
  Intel  -4.65% 
  AMD    -4.47%

011000115y ago

I wonder how permanent this is. As a Nvidian who sells his shares as soon as they vest and who owns some Intel for diversification, I wonder if I should load up on Intel? You really can't compete with their fab availability. Having a great design means nothing unless you can get TSMC to grant you production capacity.

nabla95y ago

TSMC takes orders years ahead and builds capacity to match working together with big customers. Those who pay more (price per unit and large volume) get first shot. That's why Apple is always first, followed by Nvidia and AMD, then Qualcomm.

There is bottled demand because Intel's failure to deliver was not fully anticipated by anyone.

CalChris5y ago· 1 in thread

Grace, in contrast, is a much safer project for NVIDIA; they’re merely licensing Arm cores rather than building their own ...

NVIDIA is buying ARM.

klelatti5y ago

Trying to buy Arm.

Multiple competition investigations permitting.

callesgg5y ago· 1 in thread

Super parallell arm chips could that not be a future thing for nvidia or another chip manufacturer. A normal CPU die with thousands of independent Cores.

astrange5y ago

That's Xeon Phi (formerly known as Larrabee) but in general this isn't that useful. Or rather, when it is useful it's called a GPU.

de6u99er5y ago· 1 in thread

Don't know if it's just me but this product looks like a beta-product for early adopters.

rektide5y ago

It's initially for two huge HPC systems. It'll be interesting to see what kind of availability it ever has to the rest of the world.

legulere5y ago· 1 in thread

Big Data, Big AI, what's next? Big Bullshit?

jhgb5y ago

Nah, that's already been here for quite a while.

GrumpyNl5y ago· 1 in thread

I need a new video card and there are no Nvidia to buy, all is bought by miners. Will it go the same with this card?

redtriumph5y ago

Currently, there are no plans for consumer-grade CPUs. Even this new CPU class is shipping in 2023.

Bluestein5y ago

The whole combination of AI and the name gives "watched over by machines of loving grace" a whole new twist, eh?

temp6675y ago

I know we are going to hear from the Apple haters soon or those that don't like what apple is doing (modular upgradeable systems going away) BUT it seems like Apple is moving in a similar direction as nvidia.

Apple is also I think going to soldered on / close in RAM. Nvidia looks to be doing this two CPU / GPU / Ram all close together and it doesn't look like any upgrade options. Some thinking was that Apple was continuing to increase durability / reliability etc with their RAM move.

Does anyone know requirements for the LPDDR5X type of ram mentioned here. Does this require soldering things (you obviously get lots more control if you spec chips yourself and solder on)?

crb0025y ago

+1 ECC RAM

TheMagicHorsey5y ago

Is anyone but Apple making big investments in ARM for the desktop? This is another ARM for the datacenter design.

If other companies don't make genuine investments in ARM for the desktop there's a real chance that Apple will get a huge an difficult to assail application performance advantage as application developers begin to focus on making Mac apps first, and port to x86 as an afterthought.

Something similar happened back in the day when Intel was the de facto king, and everything on other platforms was a handicapped afterthought.

I wouldn't want to have my desktops be 15 to 30% slower than Macs running the same software, simply because of emulation or lack of local optimizations.

So I'm really looking forward to ARM competition on the desktop.

1MachineElf5y ago

I wonder what percentage of it's supported toolchain components will be proprietary.

gradschoolfail5y ago

If the next one is Jean or Ada we know they took it from a google search.

j / k navigate · click thread line to collapse

203 comments

149 comments · 28 top-level

lprd5y ago· 31 in thread

mhh__5y ago

The next decade is ARM's for the taking, but if Intel and AMD can make good cores then it's not anywhere close to slam dunk.

One of the reasons why M1 is good is pure and simple that it has a pretty enormous transistor budget, not solely because it's ARM.

api5y ago

Last but not least: ARM's looser default memory model allows for more read/write reordering and a simpler cache.

ARM has a distinct simplicity and low-overhead advantage over X86/X64.

pbsd5y ago

3 more replies

NortySpock5y ago

> x86 instruction decoder may be only about ~5% of the die

What percent of the die is an ARM instruction decoder?

1 more reply

mhh__5y ago

This is why I said it's ARM's for the taking.

I'm not familiar with how ARM's memory model effects the cache design - Source?

tambourine_man5y ago

>…is pure and simple that it has a pretty enormous transistor budget

There's a lot of brute force, yes, but it's not the only reason. There are lots of smart design decisions as well.

mhh__5y ago

"One of the reasons" I did say.

1 more reply

amelius5y ago

Yes, but those decisions optimize for the single user laptop case, not for e.g. servers.

phendrenad25y ago

It really comes down to how well they can emulate X86. People aren't going to give up access to 3 decades of Windows software.

pjerem5y ago

Plus, most of the last decade software is software that runs on some sort of VM or another (be it JVM, CLR, a Javascript engine or even LLVM).

Soon (in years), x86 will only be needed by professionals that are tied to really old software. And those particular needs will probably be satisfied by decent emulation.

2 more replies

ravi-delia5y ago

NathanielK5y ago

Intel has. Many M1 design choices are fairly typical for desktop x86 chips, but unheard of with ARM.

1 more reply

dpatterbee5y ago

Buying the majority of TSMC's 5nm process output helped. It's a combination of good engineering, the most advanced process, and intel shitting themselves I would say.

jayd165y ago

Another reason is the something like 150% memory bandwidth and I'm sure there are other simple wins along those lines.

The M1 isn't necessarily a win for Arm in general. Other manufacturers weren't competing before and its yet to be seen if they will.

mhh__5y ago

It's the memory stupid!

1 more reply

NathanielK5y ago

150% compared to what?

1 more reply

kllrnohj5y ago

It will come down entirely to who can sustain a good CPU core.

Currently Apple is the only company making performance-competitive ARM cores that can make a reasonable justification for an architecture switch.

floatboth5y ago

upd: oh also in the HPC world, Fujitsu with the A64FX seems to be like the best thing ever now

kllrnohj5y ago

> oh also in the HPC world, Fujitsu with the A64FX seems to be like the best thing ever now

1 more reply

rubatuga5y ago

Fujitsu flying under the radar while having the fastest cpu ever made haha

huac5y ago

kllrnohj5y ago

> one hard-to-replicate factor is that they designed their hardware and software together, the ops which MacOS uses often are heavily optimized on chip.

> but one factor that you can replicate is colocating memory, CPU, and GPU, the system-on-chip architecture.

aeyes5y ago

Amazons ARM chips are performance competitive as well, for many workloads you can expect at least similar performance per core at the same clock speed.

mr_toad5y ago

> I used to think that ARM was really only suited for smaller devices.

The current fastest supercomputer uses ARM.

https://en.wikipedia.org/wiki/Fugaku_(supercomputer)

CalChris5y ago

Apple isn't entering the cloud market. Moreover the M1 isn't a cloud cpu. The M1 SOC emphasizes low latency and performance per watt over throughput.

enos_feedler5y ago

1. https://9to5mac.com/2021/02/02/m1-mac-mini-in-the-cloud/

1 more reply

fulafel5y ago

The instruction set doesn't make a significant difference technically, the main things about them are monopolies (patents) tied to ISAs, and sw compatibility.

rvanlaar5y ago

fulafel5y ago

bitwize5y ago

> So is ARM the future at this point?

The near future. A few years out, RISC-V is gonna change everything.

dkjaudyeqooe5y ago

ARM is the present, RISC-V is the future and Intel is the past.

The magic of Apple's M1 comes from the engineers who worked on the CPU implementation and the TSMC process.

Really, today it's a question of who has the best implementation of any given architecture.

modeless5y ago· 16 in thread

I hope they make workstations. I want to see some competition for the eventual Apple Silicon Mac Pro.

titzer5y ago

jillesvangurp5y ago

klelatti5y ago

It's hard to imagine that until a few months ago it was very difficult to get a decent Arm desktop / laptop. I imagine lots of developers working now to fix outstanding Arm bugs / issues.

giantrobot5y ago

But a million (est) new general purpose ARM computers hitting the population certainly affects the prioritizing of ARM issues in a bug tracker.

1 more reply

mhh__5y ago

> compiler backends to Arm that want a piece of that sweet performance pie

How many compilers didn't support ARM?

titzer5y ago

dhruvdh5y ago

They are licensing ARM cores; which as of now cannot compete with Apple silicon.

There is potential for competition from Qualcomm after their Nuvia acquisition though.

ac295y ago

martinald5y ago

The 40 core Xeon also costs around 10k.

There's rumors that the new iMac will have a 20 core M1 (16+4). I imagine that will be faster than even the top line $10k Xeon.

I have absolutely no doubt apple could produce a ridiculously good server CPU from the M1. I doubt they will actually do it though.

2 more replies

devmor5y ago

What do you mean ARM cores can't compete with Apple silicon? "Apple silicon" are ARM cores.

mlyle5y ago

He means cores made by ARM, not cores implementing the ARM ISA. Currently, the cores designed by ARM cannot touch the Apple M1.

dharmab5y ago

Apple Silicon is compatible with the ARM instruction set but they are not "just ARM cores" in their internal design.

adgjlsfhk15y ago

It seems weird to me to say that arm cores can't compete with apple silicon given that apple doesn't own fabs. They are using arm cores on TSMC silicon (exactly the same as this).

seabrookmx5y ago

> They are using arm cores on TSMC silicon (exactly the same as this)

1 more reply

macksd5y ago

You probably mean less powerful than this, but they do: https://www.nvidia.com/en-us/deep-learning-ai/solutions/work....

modeless5y ago

titzer5y ago· 10 in thread

mrlento2345y ago

bdc-hpc5y ago

wombat235y ago

Are there more sources for technical details about the new infrastructure? The interview linked above left me with more questions than answers.

IanCutress5y ago

I suspect that's more racks of storage, not racks of compute. Nothing to suggest it's compute.

seniorivn5y ago

as i understand it's compute, just not cpu compute, those cpu are designed to be good enough for cuda servers

DetroitThrow5y ago

Hey Ian, I love reading your posts on Anandtech, you're a fantastic technical communicator.

titzer5y ago

Hopefully some architectural details are forthcoming then! But that is not what is in this article.

temp6675y ago

The CPU cores are probably not that interesting, it's going to be the GPU and interlink stuff (pretty impressive if true) that's going to drive this?

kats5y ago

It says they use Arm Neoverse cores so it is another processor like Fujitsu A64FX and Amazon Graviton 2.

allie15y ago

As AMD proved us, a lot can happen in 3 years

gchadwick5y ago· 8 in thread

It'd be interesting to know if NVidia are going for an ARMv9 core, in particular if they'll have a core with an SVE2 implementation.

It may be they don't want to detract from focus on the GPUs for vector computation so prefer a CPU without much vector muscle.

klelatti5y ago

MikeCapone5y ago

I doubt it, it's not like the market for acceleration is stagnant and saturated and they need to steal some marketshare points from one side to help the other.

It's all greenfield and growing so far, they'll win more by having the very best products they can make on both sides.

mlyle5y ago

You'd think. But it wouldn't be the first time a new product is hampered to not slightly theoretically cannibalize an existing product family.

adrian_b5y ago

They have said clearly that the core is licensed from ARM and one of the Neoverse future models.

There was no information whether it will have any good SVE2 implementation. On the contrary they insisted only on the integer performance and on the high-speed memory interface.

gchadwick5y ago

I'd suspect NVidia would be using the V1 here as it's the higher performing core, but not way to be certain.

dragontamer5y ago

Neoverse V1 has SVE, Neoverse E or N do not.

Teongot5y ago

Neoverse-N2 will have SVE2 (source https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aar... )

theonlyklas5y ago

I think they will use SVE2 because I assume they'll need to perform vector reads/writes to NVLink connected peripherals to reach that 900GB/s GPU-to-CPU bandwidth metric they described.

DonHopkins5y ago· 8 in thread

I love the name "Grace", after Grace Hopper.

paulmd5y ago

There's a tendency to use first names to refer to women in professional settings or political power that is somewhat infantilizing and demeaning.

https://news.cornell.edu/stories/2018/07/when-last-comes-fir...

https://metro.co.uk/2018/03/04/referring-to-women-by-their-f...

(etc etc)

But yes I very much like the idea of naming it after Hopper.

bloak5y ago

Vaguely related: J. K. Rowling's "real" full name is Joanne Rowling. The publisher "thought a book by an obviously female author might not appeal to the target audience of young boys".

There's another famous (in the UK at least) computer scientist called Hopper: Andy Hopper. So "G.B.M. Hopper", perhaps? That would have more gravitas than "Andy"!

trynumber95y ago

Hopper was already reserved for an Nvidia GPU: https://en.wikipedia.org/wiki/Hopper_(microarchitecture)

paulmd5y ago

Yeah, I dunno what is going on with that, I assumed that had changed if they were going to use the name "grace" for another product.

Otherwise it's a little awkward to have products named both "grace" and "hopper"...

adrian_b5y ago

I do not believe that referring to women using the first names is somewhat infantilizing and demeaning.

Unfortunately, at least in most Western societies, using the first names is the only way to refer unambiguously to women.

So while Grace is the computer scientist, Hopper is her husband and Murray is her father. Using the name Grace makes clear who is honored.

Nowadays, in many places there are laws that allow women to choose their family names or to combine the family names.

Ideally, a human should keep forever the family name used at birth and the parents should choose one of their family names for the children.

ashtonbaker5y ago

pezezin5y ago

   Ideally, a human should keep forever the family name used at birth and the parents should choose one of their family names for the children.

I prefer the Spanish way, have two family names. We have been doing it for centuries, it baffles me that other countries find it so difficult to adopt a similar system.

hderms5y ago

I feel like there's a non-zero chance they named it Grace instead of Hopper so their new architecture doesn't sound like a bug or a frog or something. You could be right, though

Aissen5y ago· 6 in thread

GPU-to-CPU interface >900GB/sec NVLink 4. What kind of interconnect is that ? Is that even physically realistic ?

robomartin5y ago

Well, PCIe 6 x16 will do 128 GB/s. Of course, the real question is how many transactions per second you get. For the PCIe 6 16 lanes it's about 64 GT/s.

Yet another case, again, speaking in general terms, would be the case of having to insert wait states to deal with memory access or other processor architecture issues.

jabl5y ago

rincebrain5y ago

[1] - https://en.wikipedia.org/wiki/NVLink

Aissen5y ago

With 50Gb/s per lane, that would be 144 lanes to reach 900GB/s. Quite impressive.

rincebrain5y ago

[1] - https://www.nvidia.com/en-us/data-center/nvlink/

freeone30005y ago

Depends on how big you want to make it. If they're willing to go four inches, that'd do it with existing per-pin speeds from NVLink 3.

remexre5y ago· 5 in thread

> Today at GTC 2021 NVIDIA announces its first CPU

Wait, Nvidia's been making ARM CPUs for years now; most memorably Project Denver.

jdsully5y ago

NVIDIA called it their first “data center CPU”. Our helpful reporter simplified it to the point of being flat out wrong. Not uncommon.

justin665y ago

I expected more from a site called VideoCardz.

015a5y ago

Arguably, most memorably, Tegra; the CPU/GPU which powers the Nintendo Switch.

Jasper_5y ago

That uses a licensed ARM Cortex design under the hood.

uxp1005y ago

The Tegra line included Denver and Carmel cores. Tegra was the product line, then the Switch chips have their own names.

api5y ago· 5 in thread

AnthonyMouse5y ago

But another reason they won't do it is that TSMC has a finite amount of 5nm fab capacity. They can't make more of the chips than they already do.

api5y ago

I'm thinking of a 64-core M1. It would not be the laptop chip.

ac295y ago

_ph_5y ago

bombcar5y ago

How much of that performance is on-chip memory and how usable/scalable is that? An Xserve that is limited to one CPU and can't have more RAM would pretty mediocre.

lprd5y ago· 5 in thread

hilios5y ago

Hamuko5y ago

moistbar5y ago

Does ARM have a uniquely complex build process, or is it the mix of architectures that makes it more difficult?

sumtechguy5y ago

Hamuko5y ago

Mix of architectures and the fact that our normal CI server is still x86-based and really didn't want to do ARM builds.

ksec5y ago· 4 in thread

Based on Future ARM Neoverse, so basically nothing much to see here from CPU perspective, What really stands out, are those ridiculous number from its Memory system and Interconnect.

CPU: LPDDR5X with ECC Memory at 500+GB/s Memory Bandwidth. ( Something Apple may dip into. R.I.P for Mac with upgradable Memory )

GPU: HBM2e at 2000 GB/s. Yes, three zeros, this is not a typo.

NVLink: 500GB/s

This will surely further solidify CUDA dominance. Not entirely sure how Intel's XE with OneAPI and AMD's ROCm is going to compete.

Dylan168075y ago

> GPU: HBM2e at 2000 GB/s. Yes, three zeros, this is not a typo.

It's a good step forward but your average consumer GPU is already around a quarter to a third of that and a Radeon VII had 1000 GB/s two years ago.

m_mueller5y ago

3 more replies

jabl5y ago

The Nvidia A100 80GB already provides 2 TB/s mem BW today. Also using HBM2e.

nickflood5y ago

RTX 3090 also does 936 GB/s which is very close to Radeon 7 but with conventional GDDR

alexhutcheson5y ago· 3 in thread

klelatti5y ago

Designing but not yet selling Qualcomm / Nuvia?

alexhutcheson5y ago

Yeah will be interesting to see if and when they bring a design to market.

intvocoder5y ago

The ThunderX3 team is mostly gone, it's been hollowed out.

valine5y ago· 3 in thread

I like the sound of a non-Apple arm chip for workstations. Given my positive experience with the M1 I'd be perfectly happy never using x86 again after this market niche is filled.

awill5y ago

Me too. But my decades old steam collection isn't looking forward to it. That's one advantage of cloud gaming. It won't matter what your desktop runs on.

webaholic5y ago

I don't think this will be anywhere near as good as the M1, since they are using the ARM Neoverse cores.

ac295y ago

cma5y ago· 3 in thread

Real business-class features we want to know about:

rubatuga5y ago

Market segmentation is practiced by every chip company that you use. Intel: ECC. AMD: ROCM. Qualcomm: cost as percentage of the phone price.

cma5y ago

I still think Nvidia takes it further.

volta835y ago

Every company does market segmentation: it makes sense to have customers that want a feature pay more for it.

Still, every company does it differently.

For example, both NVIDIA and AMD compute GPUs are necessarily more expensive than gamer GPUs because of hardware costs (e.g. HBM).

However, NVIDIA gamer GPUs can do CUDA, while AMD gamer GPUs can't do ROCm.

The reason is that NVIDIA has 1 architecture for gaming and compute (Ampere), while AMD has two different architectures (RDNA and CDNA).

1 more reply

rexreed5y ago· 3 in thread

I understand the here-and-now AI applications. But this is smelling more like Big AI Hype than Big AI need.

gwern5y ago

cracker_jacks5y ago

"640K ought to be enough for anybody."

wmf5y ago

GPT-4 and GPT-5.

filereaper5y ago· 2 in thread

Looks like NVidia broke up with POWER on IBM and made their own chip.

They have interconnects from Mellanox, GPUs and their own CPUs now.

I suspect the supercomputing lists will be dominated by NVidia now.

physicsguy5y ago

IBM have basically hollowed out their team, so I'd say it's IBM ditching the market more than anything... our centre would not now consider POWER even though we currently have nodes.

arcanus5y ago

That is certainly the trend. AMD is bringing Frontier online later this year, which might be the only counter to this.

rektide5y ago· 2 in thread

rektide5y ago

pezezin5y ago

nabla95y ago· 2 in thread

Finally news from Nvidia that really moved markets.

  Nvidia +4.68%, 
  Intel  -4.65% 
  AMD    -4.47%

011000115y ago

nabla95y ago

There is bottled demand because Intel's failure to deliver was not fully anticipated by anyone.

CalChris5y ago· 1 in thread

Grace, in contrast, is a much safer project for NVIDIA; they’re merely licensing Arm cores rather than building their own ...

NVIDIA is buying ARM.

klelatti5y ago

Trying to buy Arm.

Multiple competition investigations permitting.

callesgg5y ago· 1 in thread

Super parallell arm chips could that not be a future thing for nvidia or another chip manufacturer. A normal CPU die with thousands of independent Cores.

astrange5y ago

That's Xeon Phi (formerly known as Larrabee) but in general this isn't that useful. Or rather, when it is useful it's called a GPU.

de6u99er5y ago· 1 in thread

Don't know if it's just me but this product looks like a beta-product for early adopters.

rektide5y ago

It's initially for two huge HPC systems. It'll be interesting to see what kind of availability it ever has to the rest of the world.

legulere5y ago· 1 in thread

Big Data, Big AI, what's next? Big Bullshit?

jhgb5y ago

Nah, that's already been here for quite a while.

GrumpyNl5y ago· 1 in thread

I need a new video card and there are no Nvidia to buy, all is bought by miners. Will it go the same with this card?

redtriumph5y ago

Currently, there are no plans for consumer-grade CPUs. Even this new CPU class is shipping in 2023.

Bluestein5y ago

The whole combination of AI and the name gives "watched over by machines of loving grace" a whole new twist, eh?

temp6675y ago

Does anyone know requirements for the LPDDR5X type of ram mentioned here. Does this require soldering things (you obviously get lots more control if you spec chips yourself and solder on)?

crb0025y ago

+1 ECC RAM

TheMagicHorsey5y ago

Is anyone but Apple making big investments in ARM for the desktop? This is another ARM for the datacenter design.

Something similar happened back in the day when Intel was the de facto king, and everything on other platforms was a handicapped afterthought.

I wouldn't want to have my desktops be 15 to 30% slower than Macs running the same software, simply because of emulation or lack of local optimizations.

So I'm really looking forward to ARM competition on the desktop.

1MachineElf5y ago

I wonder what percentage of it's supported toolchain components will be proprietary.

gradschoolfail5y ago

If the next one is Jean or Ada we know they took it from a google search.

j / k navigate · click thread line to collapse