Japan Captures TOP500 Crown with Arm-Powered Supercomputer (opens in new tab)

(top500.org)

315 pointsl31g6y ago208 comments

208 comments

142 comments · 32 top-level

Animats6y ago· 15 in thread

The end of the US semiconductor industry is now in sight.

The only US owned state of the art fabs in the US belong to Intel. Intel survives because they have a high margin on x86 CPUs. Today, TMSC announced 5nm, and the top supercomputer is ARM-based.

Apple seems to be going ARM. Chromebooks are ARM. Microsoft now offers Windows on ARM, on the Surface Pro X. Mobile never used x86. x86 is on the way out. What's left for Intel?

(Micron is still a major force in DRAM, amazingly.)

floatboth6y ago

> The only US owned state of the art fabs in the US belong to Intel

Is the "US owned" clarification to exclude Global Foundries' New York fab? :D

> Chromebooks are ARM

Maybe half of them.

> What's left for Intel?

Fabricating others' designs like TSMC?

But also, Intel isn't going away any time soon, just not being a monopoly anymore.

Animats6y ago

Global Foundries New York fab (Fab 8), from Wikipedia:

Technology: 28 nm and 14 nm. 7 nm planned. However, in August 2018, GlobalFoundries made the decision to suspend 7 nm development and planned production, citing the unaffordable costs to outfit Fab 8 for 7 nm production. GlobalFoundries held open the possibility of resuming 7 nm operations in the future if additional resources could be secured.

So, not a state of the art fab. Couldn't afford to keep up.

1 more reply

btian6y ago

I can't disagree more. The US semiconductor industry is more vibrant than ever.

Intel reported record quarter every quarter for the last 2-3 years.

Fabless semiconductors are doing better than ever - nVidia, AMD, Apple, Qualcomm, Google TPU etc.

Most of the high performance ARM SoCs come from Apple and Qualcomm, both American companies.

IanCutress6y ago

>Intel reported record quarter every quarter for the last 2-3 years.

Due to high demand on people needing 30%+ more processing power after they lost 30% perf due to Spectre/Meltdown. When they get the manufacturing and supply up to a resonable standard, they'll start building inventory and start selling cheaper chips again. Margin and ASP will decrease, and investors will sign out

rasz6y ago

Commodore reported 7-year record revenue/profit 4 years before going bankrupt. https://dfarq.homeip.net/commodore-financial-history-1978-19...

tom_mellior6y ago

> What's left for Intel?

According to https://en.wikipedia.org/wiki/Usage_share_of_operating_syste..., about 80% to 90% of the desktop and laptop computer market (counting the share of Windows + Linux devices in these categories). Intel won't starve.

freeflight6y ago

Which is the result of AMD not having been able to compete for close to a decade.

Expect to see these numbers change drastically over the next years as Zen 2 finally turned the ship around on that by not only making AMD CPUs competitive, but in many cases the straight up better, yet still more affordable, choice.

Which is already reflected in current trends: Barely any consumer-level hardware outlets still recommend Intel builds, which is down to lack of PCIe 4.0 support and only very expensive Intel CPUs being able to outperform AMD CPUs in fringe-use cases like single-core performance in gaming, while still demanding a hefty price-premium.

A premium that many people are simply not willing to pay for anymore.

As a small data point just look at the top 10 CPUs on price comparison websites, like German pcgameshardware [0]: 8 out of the top 10 CPUs are all AMD.

Which will not mean that Intel will starve, but it very much puts them into the position that AMD has been in these past years, that of the underdog fighting an uphill battle to regain relevancy in the consumer sector.

[0] https://preisvergleich.pcgameshardware.de/?o=4

pankajdoharey6y ago

You see Intel is a 50 yr old company, you think they will sit hand in hand on their bums? If the majority Industry shifts towards ARM ISA Intel will evolve, What stops Intel from Licensing the ARM Core and build a industry leading ARM Chip? I think no-one with the exception of Apple in the semiconductor industry has more resources than Intel to build a world class ARM CPU. Intel is just trying to drag x86 as far as possible because it can monopolise the architecture only AMD and Via are other two vendors who have license to build x86 processors.

Animats6y ago

What stops Intel from Licensing the ARM Core and build a industry leading ARM Chip?

That others can compete directly on price. Intel can probably do it technically, but will not have the margins they had with x86.

1 more reply

harpratap6y ago

> I think no-one with the exception of Apple in the semiconductor industry has more resources than Intel to build a world class ARM CPU

Are you forgetting Amazon's Graviton? And Nuvia seems to be in a good position to dethrone Apple in best ARM CPU race too.

1 more reply

sitharus6y ago

Intel already license the ARM core IP and used to build their own ARM chips under the XScale brand. They sold the line off in 2007 IIRC but retained the ARM architectural license.

The only thing that stops them is their will to do it.

1 more reply

Lio6y ago

> What's left for Intel?

Well I'd like to see them go all in on Desktop Linux.

But then I'm a dreamer.

mey6y ago

What would that look like? Shifting to becoming a software developer and leading the charge a Desktop Linux? A vertical integrator like Apple, Microsoft Surface, System 76 (yes those are varying degrees of success)?

I really like the NUC products. They have the knowledge and skills to do everything (cpu, ram, soc, radio/wifi/cell (or used too), storage) in house but industrial design (maybe they do). I have never personally experienced a software experience from Intel I have ever remotely enjoyed.

Edit: Actually there is software I have used from Intel that I have really enjoyed, the BIOS for the NUC. So I stand corrected.

3 more replies

ganzuul6y ago

The EUV machines are made in the US.

rasz6y ago

Netherlands is 7000km off your guess.

Jimmy Kimmel Can You Name a Country? strikes again https://www.youtube.com/watch?v=kRh1zXFKC_o

2 more replies

ViralBShah6y ago· 13 in thread

Given that today's HPC architectures are mostly power constrained, and a majority of the FLOPS often come from GPUs (for their flop/watt ratios), this direction is not surprising.

ARM has been making major strides in the high performance area. The new AWS Graviton processors are pretty nice from what I have heard. And then there's the ARM in Mac. Yup and Julia will run on all of these!

While I say all of this, I should also point out that the top500 benchmark pretty much is not representative of most real-life workloads, and is largely based on your ability to solve the largest dense linear solve you possibly can - something almost no real application does.

(The website is down, so I haven't been able to look at the specs of the actual machine).

fhqghds6y ago

Get ready for a surprise then: all those FLOPS are coming from the ARM cores.... This beast has no GPUs:

https://postk-web.r-ccs.riken.jp/spec.html

Merrill6y ago

It looks like this is not an ARM core, but a Fujitsu implementation of the Arm v8-A instruction set and Fujitsu-developed Scaleable Vector Extension. Most likely the latter is doing all the heavy lifting.

https://www.fujitsu.com/global/about/resources/news/press-re...

>A64FX is the world's first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years' worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance.

1 more reply

leeter6y ago

So looking at anandtech's breakdown the CPUs are closer to a knights landing 'CPU/GPU' than a traditional CPU (currently). They also have a ton of HBM2 right next to the dies so this should be insanely fast as they can feed those cores very very quickly regardless of how fast each core is by clock and pipeline. That should massively reduce stalls.

3 more replies

ViralBShah6y ago

That's pretty cool! That probably means that applications will have an easier time. Looks like it has 512-bit SIMD.

I wonder what BLAS they are using, and if the contributions are open sourced.

1 more reply

d_tr6y ago

I am really happy to have come across this post, mainly due to this fact.

stephencanon6y ago

Worth noting that Fugaku has no GPU/accelerator; all the compute is located in-core (cpu). The core itself has some GPU-like qualities, of course, since it's more optimized for semi-uniform compute throughput than a "normal" CPU is.

gpderetta6y ago

Fujitsu has been building its own HPC CPUs, for a long time, whether they use the ARM architecture or SPARC doesn't probably matter much for them. They know how to make them fast.

bashinator6y ago

Yup, one of my first jobs out of college was at HAL.

calaphos6y ago

> While I say all of this, I should also point out that the top500 benchmark pretty much is not representative of most real-life workloads, and is largely based on your ability to solve the largest dense linear solve you possibly can - something almost no real application does.

They also publish the HPCG benchmark with sparse matrixes. And unsurprisingly an order of magnitude lower flops across the board. The Fujitsu chip scales a whole lot better than the usual Nvidia GPUs though.

Symmetry6y ago

I'll count myself as someone surprised, given that GPUs are often better tuned to HPC code, that Fujitsu was able to do so well with an Intel Phi approach of just using larger vector units on general purpose CPUs. I wouldn't have thought you could make an out of order core efficiently support scatter/gather the way this thing seems to, though I guess it's possible that the vector unit is in order. Well, the proof is in the pudding and hats off to Fujitsu and ARM.

wenc6y ago

> based on your ability to solve the largest dense linear solve you possibly can - something almost no real application does.

Sounds right.

I was going to say what about large-scale optimization problems? But I realized that most typically only require sparse linear solves.

Gradient descent does require the solution of dense Ax=b systems. But the most visible/popular application of large-scale gradient descent today, neural networks, typically use SGD which require no dense linear solves at all.

tasogare6y ago

> And then there's the ARM in Mac.

Are you posting from the future or referring to the T2 chips?

kohtatsu6y ago

It was officially announced at the end of the WWDC event today.

mortenjorck6y ago· 12 in thread

I cannot think of any plausible way in which Apple could have influenced the date of this announcement, but the timing, given what is expected to be announced later today, is uncanny.

GrantZvolsky6y ago

As far as I remember the TOP500 list is updated more or less the same time every November and June. Google Trends agree[1].

[1]: https://trends.google.com/trends/explore?date=all&q=top500

cbkeller6y ago

Since Fugaku has been in the works for a while, I wonder if Apple just tried to choose the date of their announcement to coincide with the Top500 ranking

chongli6y ago

It’s nice but I think Apple tends to downplay the ARM part of the story and focus on their own branding for the chips (A12, A13, etc).

xxs6y ago

The chips in the supercomputer are not exactly ARM, they do have arm-8 instruction set but that's just for loading the proprietary vector extension unit.

Pretty much a GPU alike stuff with arm-8 set to be able to run its OS.

floatboth6y ago

No, it's not a proprietary extension, it's Arm's Scalable Vector Extension!

d_tr6y ago

I do not get the "not exactly ARM" bit. The chip has a lot of cores that execute standard ARM code with SVE instructions mixed in. You can download the manual for the SVE from ARM's website. Following a quick search I also see that GCC has a flag for generating SVE instructions.

swyx6y ago

am out of the loop. what is the rumor?

nvahalik6y ago

That Apple will be moving at least part of their consumer machines to ARM-based chips.

IncRnd6y ago

ARM software can predict the future via https://github.com/go-bears/prognosticationengines!! Go Bears!

PS This is meant to be sarcastic.

ksec6y ago

Imagine if Apple were to announce a Mac utilise the exact same chip. A powerful high TDP CPU will be great fit for Apple, while they continue to focus on low power and energy efficiency for their mainstream product.

nordsieck6y ago

> Imagine if Apple were to announce a Mac utilise the exact same chip.

This seems unlikely unless Apple has decided to sell their chips for the first time ever.

I suppose it could be interesting from Apple's PR perspective, but I have serious doubts that the supercomputer owner would agree to this. What happens to them when Apple discontinues the current chip in favor of the next one?

blattimwind6y ago

The A64fx isn't that high TDP, if the reported number of ~160 W for the entire thing (including HBM) is accurate.

xhkkffbf6y ago· 10 in thread

Not to be dismissive, but can't anyone "build" the biggest supercomputer by reserving enough instances at AWS or GCP? I'm sure that AWS or GCP would like to encourage this competition, but it seems a bit, well, boring.

floren6y ago

The ranking is calculated based on the Linpack benchmark. Being a parallel application, performance is not simply scaled to number of processors; the network interconnect is hugely important.

Now, although Linpack is a better evaluation metric for a supercomputer than simply totaling up # of processors and RAM size, it's still a very specific benchmark of questionable real-world utility; people like it because it gives you a score, and that score lets you measure dick-size, err, computing power. It also, if you're feeling unscrupulous, lets you build a big worthless Linpack-solving machine which generates a good score but isn't as good for real use (an uncharitable person might put Roadrunner https://en.wikipedia.org/wiki/Roadrunner_(supercomputer) in this category)

wnissen6y ago

Linpack is pretty lightweight as far as benchmarks go. You need some memory bandwidth but not much network at all, just reduces which are pretty efficient. It's not a good proxy for the most challenging applications, but lives on because no one has a better alternative. Basically, these are the classes of problems: 1. compute bound, trivially parallel. These are like breaking RSA encryption, stuff that was done over the Internet 20 years ago even when links were much slower. Basically doesn't even need the proverbial Beowulf cluster. Linpack is basically in this category, so you could, with care, make a cloud machine to do it. 2. Memory-bandwidth bound, trivially parallel. Stuff like search engine index building, Still not hard to do over distributed networks, or, yes, commercial Ethernet in a Beowulf Cluster. 3. Network bound, coupled parallel. The most challenging category, can only be done with a single-site computer on a fast interconnect. And, as noted, "fast" here has a totally different meaning compared to commercial networking latencies, especially. Depending on the type of network, you can have a significant fraction of the total transistors in the machine in the interconnect. These networks are heavily optimized for specific MPI operations, such as All-to-All, where you might have 1 million cores. The reason is that the whole calculation, being coupled, moves as quickly as the slowest task on the slowest node. You see weird stuff like reserving an entire core just for communicating with the I/O system and handling OS interrupts, because otherwise the "jitter" of nodes randomly doing stuff slows down the entire machine.

sushshshsh6y ago

I am curious to learn a bit more about how supercomputer scores measure proportionally to "real world performance", which is a hard thing to quantify since there are probably hundreds of different application "types" in the "real world".

Combine this with the fact that many applications are limited by network throughput rather than by CPU/SSD/RAM/PCIE, and performance becomes a hard thing to quantify even in terms of "how many ARM cores do i need to buy to make my CPU not be the bottleneck"

There are benchmarks for ARM linux compilation and ARM openjdk performance benchmarks which are a good start, but I don't know how to compare SKUs between those ARMs and the ones found in top500 supercomputers.

1 more reply

mxcrossb6y ago

AWS has made it into the top500 a few times in fact, though not that high on the list. I think the main issue would be reserving enough machines that have a high performance network between them, which is not a typical cloud need.

But the more interesting question for me is: on an embarrassingly parallel workload, how does Amazon’s full infrastructure compare to these top machines? I’d assume that Amazon keeps that a secret.

cbkeller6y ago

Looking into Amazon's power bill might be a useful start: Fugaku is listed as drawing 28 MW in OP. It's more power efficient than most, but to an order of magnitude that's a number we can work with. Amazon's power usage for US-East was estimated at 1.06 GW in 2017 [1] (at which time they also apparently owned about a gigawatt of renewable generating capacity [2], now closer to 2 GW [3]).

Either way you slice it, Amazon likely owns at least an order of magnitude more FLOPS than any single system on the top500. What they presumably don't have is the low latency interconnects, etc., needed for traditional supercomputing.

[1] https://datacenterfrontier.com/amazon-approaches-1-gigawatt-...

[2] https://www.eenews.net/stories/1060048034

[3] https://sustainability.aboutamazon.com/sustainable-operation...

fhqghds6y ago

yyyeah... no.

A major part of what makes these machines special is their interconnect. Fujitsu is running a 6D torus interconnect with latencies well in the sub-usecond range. The special sauce is ability of cores to interact with each other with extreme bandwidth at extremely low latencies.

sushshshsh6y ago

Thank you for this helpful info. For comparison's sake, say that you wanted to make babby's first super computer in your house with 2 laptops. That is to say, each laptop is a single core x86 system with its own motherboard and ram and ssd, and they are connected to each other in some way (ethernet? usb?)

What software would one use to distribute some workload between these two nodes, what would the latency and bandwidth be bottlenecked by (the network connection?) and what other key statistics would be important in measuring exactly how this cheap $400 (used) set up compares to price/watt/flop performance for top 500 computers?

5 more replies

01acheru6y ago

As blopeur said in another reply you need to feed data to the supercomputer, and as parallelizable as your algorithm may be some data might need to be shared between nodes at some point, just to name a couple of examples.

If you connect a lot of cloud instances to act as a giant distribute computing cluster they’ll receive/share/return data via network interfaces or yet worse the internet, this is really really slower than what super computers do.

For many applications that solution would be more efficient than a supercomputer, but for applications that need a supercomputer it would be inefficient. It just depends on what you need to do, but in any case it would be a computing cluster not a supercomputer.

(my two cents, I’m not into that field)

dekhn6y ago

From https://blogs.microsoft.com/ai/openai-azure-supercomputer/:

"""The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of network connectivity for each GPU server. Compared with other machines listed on the TOP500 supercomputers in the world, it ranks in the top five, Microsoft says."""

Each of the big three cloud providers could, if they chose to, build a #1 Top500 computer using what they have available (that includes the CPUs, GPUs, and interconnect). That said, it's unclear why they would: the profit would be lower than if they sold the equivalent hardware, without the low-latency interconnect, to "regular" customers. The supercomputer business isn't obscenely profitable.

gnufx6y ago

I haven't checked the current list, but previous ones have been roughly half "cloud provider" systems with essentially useless interconnects for real HPC work. NCSA refused to indulge in the game with Blue Waters, notably.

dman6y ago· 7 in thread

Really wish Fujitsu sold a developer kit with an A64fx chip - its the only shipping ARM chip with SVE that I know of and I would love to get my hands on one to play with.

vt2406y ago

There are some architecture manuals on github to peruse.

https://github.com/fujitsu/A64FX

loudmax6y ago

No kidding!

I don't have any sense of how much these cost to manufacture. There ought to be a market for a A64fx based rackmount server system. If the price isn't outrageous, I'd love to see these sold as an SBC.

timthorn6y ago

Something like the Fujitsu PRIMEHPC FX700?

1 more reply

gpderetta6y ago

> its the only shipping ARM chip with SVE

IIRC the extension was specifically designed by Fujitsu for their needs, so it is not surprising.

throwaway57926y ago

Not quite, although they were deeply involved for much of it.

gnufx6y ago

I'm pretty sure there's an emulator, which is how you usually do early development.

gnufx6y ago

See https://github.com/RIKEN-RCCS/riken_simulator

davidhyde6y ago· 7 in thread

The site of the organisation responsible for assessing the fastest computers in the world succumbs to the hacker news hug of death.

lorenzhs6y ago

Oh my, it's a django app with debug mode enabled. I just got an InterfaceError with the full traceback and django configuration. (I've emailed them so they can fix it)

Hamuko6y ago

Just how in the world are people deploying Django apps with DEBUG = True?

2 more replies

antishatter6y ago

Good work!

jbay8086y ago

To be fair, I wouldn't host my website on my world-class supercomputer either, if I had one...

3pt141596y ago

To be even fairer, I've served a shit ton of traffic on a small DigitalOcean droplet and never had issues because my stack is reasonable.

4 more replies

Florin_Andrei6y ago

Well, it's fastest number crunchers, not fastest file servers.

Pet_Ant6y ago

Ah, but did it do it quicker than the last version? That's the question.

KKKKkkkk16y ago· 7 in thread

We're living in an age in which AWS dwarfs all the machines on the TOP500 taken together. The TOP500 is a vestige of the cold war that needs to be retired. Similarly to how the US and the USSR used to compare their numbers of nuclear warheads, it is comparing a reserve of capacity that is probably going to be retired having brought marginal benefits at best, all in order to goad taxpayers into a futile competition.

rwmj6y ago

What a complete load of nonsense. Do let us know what your AWS bills are like when you run your 100s of petaflops HPC job there. And what is the interconnect like? A few gigabit switches aren't the same as the interconnects on these supercomputers.

user59944616y ago

Google Cloud has a much superior interconnect, easily doing twice the bandwidth of AWS with lower latency.

Ethernet may not be approaching Infiniband in raw speed and latency, but I think it's doing pretty decently with 10 Gbps going to every node.

Ethernet networks are definitely much more competitive today than 10 years ago, when Infiniband already had cheap 20 Gbps network cards but 10 Gbps Ethernet card were expensive and the network switches were a rarity.

floatboth6y ago

AWS isn't just a couple gigabit switches, they do offer some HPC-oriented things: https://aws.amazon.com/hpc/efa/

gh02t6y ago

What? The US government and others are still building these computers because they are valuable. Summit for example is nearly constantly in use at full capacity and has enabled a huge variety of research. It's hardly a "reserve" capability. Not to mention that AWS isn't really suited for a lot of the biggest supercomputing workloads...

TOP500 is not very important, but it's not meant to be much more than a simple benchmark that correlates roughly with real performance. Supercomputers aren't designed just to be number 1 on TOP500, it's a byproduct of their actual goal.

guenthert6y ago

Well there obviously is interest in running simulations and other HPC applications on systems of public commercial cloud providers and there are specialized offerings including high performance interconnects, GPGPUs, large memory VMs and schedulers. Don't expect the largest of the top500 clusters to be replaced anytime soon by such, but for lesser demands people will be maintaining spread sheets and calculating the break-even point of renting vs. owning so to speak.

I'd expect in the long run, there will be only few data center operators.

1 more reply

6AA4FD6y ago

I don't think more than half a percent of taxpayers know about top500, and I don't think many who know even care how their country ranks.

fomine36y ago

Side story: HPC ranking is relatively common topic in Japan because previous K computer was targeted to cut budget by government and well reported by news.

sadfev6y ago· 6 in thread

Can any of the HPC experts shed some light on how these ARM chips are better than their predecessors. I toured a small cluster in LANL, where the ARM chips ran the hottest and their cooling was the loudest.

blopeur6y ago

A64fx have on board HMB -> that means no dram. If you look at the fugaku mother board their is no Dimm slots. All the memory is on the same package as the CPU.

This delivers a huge boost in bandwith.

HMB stand for high memory bandwidth. It offers up to 900 GB/s.

Now if you add the tofu interconnect on top you have a systems finely tuned for maximising data movement.

Remember : compute is cheap, communication is expensive.

You can have load of gpu and processors but if you can't feed them data fast enough they are useless.

blopeur6y ago

For a more in depth dive :

"Report on the Fujitsu Fugaku (富岳) System", Jack Dongarra, Jun 22, 2020

https://www.dropbox.com/s/aqntdb43p6so0z5/fugaku-report.pdf?...

ViralBShah6y ago

That is a pretty fun architecture. I hope that opens the door to higher performance for more workloads than top500.

At least with the top500 benchmark, the bandwidth is not a problem, so long as you can do a large enough problem. Since it is a linear solve that spends all its time doing matmul (n^3 operations on n^2 data), so long as the problem is big enough, you can saturate the cores.

lukevp6y ago

That's fascinating. I know that AMD has been touting HBM as a faster memory subsystem for their GPUs, is that the same as HMB where it's stacked? Or are they just calling it something similar?

2 more replies

gnufx6y ago

I'm not sure what predecessors that means (ThunderX2?), but these have been carefully "co-designed" for the job with experience from K Computer. Actually that's for a set of job types, which is part of the point. They also have extensive capability for low precision, if you want that. Note that it's not just at the top of top500, which is relatively uninteresting, but wins, or is up there, on things like HPCG, some sort of machine-learning benchmark, etc. K Computer also came out well generally, and persistently.

floatboth6y ago

TX2 is from a different company though (Cavium -> Marvell). I guess the "predecessor" of the A64FX would technically be some SPARC chip that Fujitsu used to build?

1 more reply

pininja6y ago· 4 in thread

The link is working for me, here are the details on the winner.

“The new top system, Fugaku, turned in a High Performance Linpack (HPL) result of 415.5 petaflops, besting the now second-place Summit system by a factor of 2.8x. Fugaku, is powered by Fujitsu’s 48-core A64FX SoC, becoming the first number one system on the list to be powered by ARM processors. In single or further reduced precision, which are often used in machine learning and AI applications, Fugaku’s peak performance is over 1,000 petaflops (1 exaflops). The new system is installed at RIKEN Center for Computational Science (R-CCS) in Kobe, Japan.

timClicks6y ago

Wow, so we've hit exascale. I heard claims that we would reach exascale by 2020 in 2012. I didn't believe them.

zekrioca6y ago

0.5 exascale

throwaway_pdp096y ago

There seems to be a difference between peak and useful. Be careful of marketing and hype.

(I am not an HPC guy, just IMO).

theaustinseven6y ago

It goes further than that. HPL is already a bad benchmark since it just prioritizes the narrow requirements of HPL(double precision multiplication and bandwidth). HPCC(https://icl.utk.edu/hpcc/) is generally regarded to be a better benchmark of the real value of a particular cluster for scientific use.

gnufx6y ago· 4 in thread

Perhaps it's worth pointing out some context. Given the remarkable predecessor, K Computer, this was only a matter of time. (I heard a great early talk on K, and I wish I knew the speaker for credit who was obviously working quite hard in English, but flawless, ending with basically we did it all ourselves largely de nuovo.) It seems that given the current circumstances, they haven't kept to schedule -- it was supposed to be operating next year.

There's a lot non-mainstream in this, like K, but partly influenced by K experience. Unusually, it's all apparently specifically designed for the job, from the processor to the operating system (only partly GNU/Linux). Notably, despite the innovation, it should still run anything that can reasonably be built for aarch64 straight off and use the whole node, even if it doesn't run particularly fast; contrast GPU-based systems. (With something like simde, you may even be able to run typical x86-specific code.) However, the amount of memory/core is surprising -- even less than Blue Gene Q -- and I wonder how that works out for large-scale materials science work for which it's obviously prepared. Also note Fujitsu's consideration of reliability, though the oft-quoted theory of failure rates in exascale-ish machines was obviously wrong, otherwise as the Livermore CTO said, he'd be out of a job.

The bad news for anyone potentially operating a similar system in a university, for instance, is that the typical nightmare proprietary software is apparently becoming available for it...

jabl6y ago

> However, the amount of memory/core is surprising

I think it's a limitation of the technology. HBM2 provides amazing bandwidth, but capacity is quite limited. And it's not like DIMM slots where you can just insert more of them, the memory chips are bonded to the substrate chiplet-style.

This is very similar to high-end GPU's which also use HBM2 memory, e.g. NVIDIA A100 has 40 GB.

guicho2718286y ago

FYI, https://github.com/fujitsu/A64FX

m_mueller6y ago

Maybe you mean Matsuoka sensei? He's the director of RIKEN AICS since a couple of years and a known media figure in Japan.

jabl6y ago

Satoshi Matsuoka is an "international rock star" in HPC circles. But I don't think he was involved with the K computer; before RIKEN he was IIRC at Tokyo Tech doing their "Tsubame" GPU clusters.

1 more reply

praveen99206y ago· 4 in thread

Pushing the boundaries is the best way to advance technology

Car manufacturers need ridiculously expensive race cars to push the technology to get advancements in everyday car technology. Similarly, Top500 is one way to push the technology for not just computationally but also things like better power, heat and network management in processors and computers in general.

With ever doubling server farms, Heat management of these systems will become the major contributor for environmental pollution than all vehicle exhaust. Rather than spending on renewable sources of energy for these farms, it makes sense to optimize energy consumed per processor. Hoping to see advancements in this area

In my opinion, next arm/intel will be the company who does energy efficient processors

jabl6y ago

> Heat management of these systems will become the major contributor for environmental pollution than all vehicle exhaust.

No, extra heat just radiates away into space, nothing to worry about. The problem is in the GHG emissions to generate the energy to run (and yes, cool) these systems.

> it makes sense to optimize energy consumed per processor.

Yes, absolutely. Even if you don't care about the environment, power consumption limits chip performance. Doing more work with less power is key to producing faster chips.

dawg-6y ago

>Heat management of these systems will become the major contributor for environmental pollution

Here's an idea, maybe we should just turn the moon into a giant data center?

giantrobot6y ago

In a vacuum the only way to dissipate heat is radiation and to a lesser extent conduction. In an atmosphere you also have convection. It's more efficient to cool something on Earth than on the Moon.

We also have more power options on Earth. On the Moon you'd only really have solar power available and then only two weeks a month. On Earth you've got insolation for half a day every day and the option of other renewable sources.

This is all besides the phenomenal cost of building data centers on the Moon vs Earth.

1 more reply

blueblisters6y ago

Meh building underwater makes more sense imo. Microsoft is already doing this: https://news.microsoft.com/features/under-the-sea-microsoft-...

hangonhn6y ago· 3 in thread

Anyone know the reason for the dominance of Power processors in the top 10 other than it's from IBM and they get a lot of contracts for HPC?

detaro6y ago

They're fast, have lots of memory and IO bandwidth and can do some cool other tricks (I can't remember the name right now, but they have thing for PCIe devices to participate in cache coherency, their in-system protocols scale better to more CPUs, ...)

gnufx6y ago

CAPI

1 more reply

hydroreadsstuff6y ago

The main reason is the partnership with nvidia. You get nvlink to the CPU, which you don't get on Intel/AMD (I believe). Other than that, I don't think there is a real competitive reason. The support and future timeline from IBM is lacking. I'm honestly surprised they are still big friends.

UI_at_80x246y ago· 3 in thread

I can't help but think that the top minds from Cyrix aren't feeling both smug at the vindication and dismayed that they were just a little too ahead of the curve.

The writing was on the wall that RISC would win, but the x86 juggernaut appeared unbeatable.

qayxc6y ago

> The writing was on the wall that RISC would win [...]

What do you mean by "win" exactly? RISC is just an architectural choice and means nothing on its own. For reference, Google's TPUs, which - according to Google - deliver 30-80x better performance per Watt than contemporary CPUs, use a CISC design instead [1].

This whole "RISC vs CISC"-nonsense is quite inane, given that it's a design choice that's highly application-specific.

It's even debatable, whether the A64FX can even be considered a "pure" RISC design, considering the inclusion of SVE-512 and its 4 unspecified "assistant cores" [2] ...

[1] https://cloud.google.com/blog/products/gcp/an-in-depth-look-...

[2] https://www.fujitsu.com/jp/Images/20180821hotchips30.pdf

russler236y ago

RISC is a philosophy, not so much a set of rules. If you let the creators of RISC define their approach, the division between RISC and CISC becomes more clear. Most summaries of RISC oversimplify it. Maybe that’s ironic, haha.

axaxs6y ago

I remember Cyrix as just releasing Intel compatible/clone chips, am I misremembering something?

SomeoneFromCA6y ago· 3 in thread

I suggest building a supercomputer out of Ivy Bridge Celerons. They are dirty cheap, like $2 in bulk, yet quite performant.

m_mueller6y ago

For this kind of system the price of the processors isn't really a big focus.

zokier6y ago

Can you buy 10+ million units of those?

SomeoneFromCA6y ago

Finally someone who got the joke.

akamoonknight6y ago· 2 in thread

Is there information on how the Fugaku machines are connected together? The highest performing Power9 ones seem to use InfiniBand, but is that still true with the ARM devices?

Edit: seems to be a Fujitsu designed interconnect [0]. Wonder how much of the overall performance is dependent on the difference in communication.

https://www.fujitsu.com/global/documents/solutions/business-...

gnufx6y ago

I don't know whats the best reference, but here's one: https://www.fujitsu.com/global/Images/the-tofu-interconnect-...

akamoonknight6y ago

Ah, much better than the white paper, thanks!

Aaronstotle6y ago· 2 in thread

CoreTeks on youtube has a great video that explains the brilliance of this chip.

tandr6y ago

https://www.youtube.com/watch?v=IfHG7bj-CEI

That video is from end of 2018... which makes it even more amazing.

d_tr6y ago

Good video indeed. Extra points for featuring Satoru Iwata!

fizixer6y ago· 2 in thread

As I have said countless times in the past:

- Moore's law is dead at the level of the transistor

- Architecture, HPC updates will keep coming for many years into the future

- AGI has already escaped Moore's law (i.e., development of a fully functional AGI will not be constrained by lack of Moore's law progress). And that's what really matters.

- Related note on AGI: it has escaped the data problem as well (as in we have the right kind of sensors: mainly cameras, microphones, and so on). That is, according to the categorization of AGI challenges in terms of hardware, data, algorithms, the only missing piece is the right set of algorithms.

AnimalMuppet6y ago

Why do you say that AGI has escaped Moore's law? Especially, how can you say so when you don't know what the right set of algorithms are?

fizixer6y ago

Oh then you're not going to like what I'm going to say next:

- Somewhere between 2015 and 2025, multiple individual groups will have cracked the AGI problem independently. (but 2015 is in the past, which means there are likely groups out there that have cracked the problem and keeping it a secret).

- AGI-in-the-basement scenario is very doable and has been or will be done, many times over.

1 more reply

ksec6y ago· 1 in thread

Well the link is dead. But I am guessing it is from Fujitsu A64FX, a 512 bit SIMD extension for ARM.

Edit: Turns out I was right. May be this link is better.

https://www.anandtech.com/show/15869/new-1-supercomputer-fuj...

floatboth6y ago

To be clear, the extension -- Scalable Vector Extensions -- is for any width between 128 bits and 2048 bits. (It's in the name!) The implementation in Fujitsu A64FX seems to be 512 bit specifically.

flyGuyOnTheSly6y ago· 1 in thread

There's something amusing about a blog post announcing the fastest computer in the world being unable to serve up web requests in under 10 seconds.

(When I wrote this comment, I was seeing 500 status code errors when trying to load the page)

glouwbug6y ago

True, but we can blame the programmer as we usually do

fomine36y ago· 1 in thread

Green500 #1 is MN-3 by PFN that's also from Japan and they also use original chip!

IanCutress6y ago

https://www.anandtech.com/show/15177/preferred-networks-a-50...

cinntaile6y ago· 1 in thread

This kind of feels like a publicity stunt for Arm. Arm is owned by the Japanese company SoftBank and now Japan captures the supercomputer crown. I don't want to take away from the achievement and it's certainly possible that this is just a coincidence, maybe someone with more knowledge on the subject can comment on this?

lovemenot6y ago

Fujitsu announced they would be building a Kei2 based on ARM, soon after their Kei was #1 around 5 years ago.

ARM was a British company until it was bought by SoftBank a couple of years ago.

ausbah6y ago· 1 in thread

is x86 likely to ever go away, or anytime soon? asking as a non-systems guy

stjohnswarts6y ago

Not for a long time. ARM is just more competition which is good. The guys claiming that Intel is basically a has been and can't make anything new don't know what they're talking about. Intel has come back more than once. Competition is good, I'm glad it's heating up a bit more these days.

mtgx6y ago· 1 in thread

Also a king in efficiency:

https://www.nextplatform.com/2019/11/22/arm-supercomputer-ca...

rrss6y ago

The full scale supercomputer is not quite as efficient as the prototype.

> The number nine system on the Green500 is the top-performing Fugaku supercomputer, which delivered 14.67 gigaflops per watt. It is just behind Summit in power efficiency, which achieved 14.72 gigaflops/watt.

calaphos6y ago

Even more impressive than the linpack result (2.8x faster than the runner up) is the HPCG result at 4.6x the result of summit in second place.

That benchmark consists of more sparse matrixes which are a lot more realistic depiction of hpc workloads. Seems to scale a lot better with irregular access patterns than basically Nvidia GPUs on the other systems.

Symmetry6y ago

For those interested in more details they did a presentation at Hot Chips. The slides are here:

https://www.hotchips.org/hc30/2conf/2.13_Fujitsu_HC30.Fujits...

gok6y ago

It's kind of interesting that in terms of perf/watt, it's actually slightly less efficient than Summit, which is over 2 years old. Also interesting that they went with a homogenous design (all Arm cores) instead of a heterogenous CPU+GPU setup.

d_tr6y ago

For what it's worth, Cray is also offering supercomputers with A64FX chips.

29athrowaway6y ago

Some more information about Fugaku, aka Post-K, here:

- Slides on Fugaku: https://www.fujitsu.com/global/Images/supercomputer-fugaku.p...

- SVE 512 instructions for armv8: https://www.fujitsu.com/global/Images/armv8-a-scalable-vecto...

mxcrossb6y ago

It’s sort of funny to watch the ISC event and see news of a machine with an exaop of AI performance, while the zoom presentation still can’t properly crop out the background.

cwaffles6y ago

Backup link if you're getting http 503: http://archive.is/JSvCi

IncRnd6y ago

http://archive.is/PNY80

ClarkMills6y ago

Ah but does it run Linux? [probably but couldn't see in the link] Looks like another nail in Intel's coffin...

j / k navigate · click thread line to collapse

208 comments

142 comments · 32 top-level

Animats6y ago· 15 in thread

The end of the US semiconductor industry is now in sight.

The only US owned state of the art fabs in the US belong to Intel. Intel survives because they have a high margin on x86 CPUs. Today, TMSC announced 5nm, and the top supercomputer is ARM-based.

Apple seems to be going ARM. Chromebooks are ARM. Microsoft now offers Windows on ARM, on the Surface Pro X. Mobile never used x86. x86 is on the way out. What's left for Intel?

(Micron is still a major force in DRAM, amazingly.)

floatboth6y ago

> The only US owned state of the art fabs in the US belong to Intel

Is the "US owned" clarification to exclude Global Foundries' New York fab? :D

> Chromebooks are ARM

Maybe half of them.

> What's left for Intel?

Fabricating others' designs like TSMC?

But also, Intel isn't going away any time soon, just not being a monopoly anymore.

Animats6y ago

Global Foundries New York fab (Fab 8), from Wikipedia:

So, not a state of the art fab. Couldn't afford to keep up.

1 more reply

btian6y ago

I can't disagree more. The US semiconductor industry is more vibrant than ever.

Intel reported record quarter every quarter for the last 2-3 years.

Fabless semiconductors are doing better than ever - nVidia, AMD, Apple, Qualcomm, Google TPU etc.

Most of the high performance ARM SoCs come from Apple and Qualcomm, both American companies.

IanCutress6y ago

>Intel reported record quarter every quarter for the last 2-3 years.

rasz6y ago

Commodore reported 7-year record revenue/profit 4 years before going bankrupt. https://dfarq.homeip.net/commodore-financial-history-1978-19...

tom_mellior6y ago

> What's left for Intel?

freeflight6y ago

Which is the result of AMD not having been able to compete for close to a decade.

A premium that many people are simply not willing to pay for anymore.

As a small data point just look at the top 10 CPUs on price comparison websites, like German pcgameshardware [0]: 8 out of the top 10 CPUs are all AMD.

[0] https://preisvergleich.pcgameshardware.de/?o=4

pankajdoharey6y ago

Animats6y ago

What stops Intel from Licensing the ARM Core and build a industry leading ARM Chip?

That others can compete directly on price. Intel can probably do it technically, but will not have the margins they had with x86.

1 more reply

harpratap6y ago

> I think no-one with the exception of Apple in the semiconductor industry has more resources than Intel to build a world class ARM CPU

Are you forgetting Amazon's Graviton? And Nuvia seems to be in a good position to dethrone Apple in best ARM CPU race too.

1 more reply

sitharus6y ago

Intel already license the ARM core IP and used to build their own ARM chips under the XScale brand. They sold the line off in 2007 IIRC but retained the ARM architectural license.

The only thing that stops them is their will to do it.

1 more reply

Lio6y ago

> What's left for Intel?

Well I'd like to see them go all in on Desktop Linux.

But then I'm a dreamer.

mey6y ago

Edit: Actually there is software I have used from Intel that I have really enjoyed, the BIOS for the NUC. So I stand corrected.

3 more replies

ganzuul6y ago

The EUV machines are made in the US.

rasz6y ago

Netherlands is 7000km off your guess.

Jimmy Kimmel Can You Name a Country? strikes again https://www.youtube.com/watch?v=kRh1zXFKC_o

2 more replies

ViralBShah6y ago· 13 in thread

Given that today's HPC architectures are mostly power constrained, and a majority of the FLOPS often come from GPUs (for their flop/watt ratios), this direction is not surprising.

(The website is down, so I haven't been able to look at the specs of the actual machine).

fhqghds6y ago

Get ready for a surprise then: all those FLOPS are coming from the ARM cores.... This beast has no GPUs:

https://postk-web.r-ccs.riken.jp/spec.html

Merrill6y ago

https://www.fujitsu.com/global/about/resources/news/press-re...

1 more reply

leeter6y ago

3 more replies

ViralBShah6y ago

That's pretty cool! That probably means that applications will have an easier time. Looks like it has 512-bit SIMD.

I wonder what BLAS they are using, and if the contributions are open sourced.

1 more reply

d_tr6y ago

I am really happy to have come across this post, mainly due to this fact.

stephencanon6y ago

gpderetta6y ago

Fujitsu has been building its own HPC CPUs, for a long time, whether they use the ARM architecture or SPARC doesn't probably matter much for them. They know how to make them fast.

bashinator6y ago

Yup, one of my first jobs out of college was at HAL.

calaphos6y ago

Symmetry6y ago

wenc6y ago

> based on your ability to solve the largest dense linear solve you possibly can - something almost no real application does.

Sounds right.

I was going to say what about large-scale optimization problems? But I realized that most typically only require sparse linear solves.

tasogare6y ago

> And then there's the ARM in Mac.

Are you posting from the future or referring to the T2 chips?

kohtatsu6y ago

It was officially announced at the end of the WWDC event today.

mortenjorck6y ago· 12 in thread

I cannot think of any plausible way in which Apple could have influenced the date of this announcement, but the timing, given what is expected to be announced later today, is uncanny.

GrantZvolsky6y ago

As far as I remember the TOP500 list is updated more or less the same time every November and June. Google Trends agree[1].

[1]: https://trends.google.com/trends/explore?date=all&q=top500

cbkeller6y ago

Since Fugaku has been in the works for a while, I wonder if Apple just tried to choose the date of their announcement to coincide with the Top500 ranking

chongli6y ago

It’s nice but I think Apple tends to downplay the ARM part of the story and focus on their own branding for the chips (A12, A13, etc).

xxs6y ago

The chips in the supercomputer are not exactly ARM, they do have arm-8 instruction set but that's just for loading the proprietary vector extension unit.

Pretty much a GPU alike stuff with arm-8 set to be able to run its OS.

floatboth6y ago

No, it's not a proprietary extension, it's Arm's Scalable Vector Extension!

d_tr6y ago

swyx6y ago

am out of the loop. what is the rumor?

nvahalik6y ago

That Apple will be moving at least part of their consumer machines to ARM-based chips.

IncRnd6y ago

ARM software can predict the future via https://github.com/go-bears/prognosticationengines!! Go Bears!

PS This is meant to be sarcastic.

ksec6y ago

nordsieck6y ago

> Imagine if Apple were to announce a Mac utilise the exact same chip.

This seems unlikely unless Apple has decided to sell their chips for the first time ever.

blattimwind6y ago

The A64fx isn't that high TDP, if the reported number of ~160 W for the entire thing (including HBM) is accurate.

xhkkffbf6y ago· 10 in thread

floren6y ago

The ranking is calculated based on the Linpack benchmark. Being a parallel application, performance is not simply scaled to number of processors; the network interconnect is hugely important.

wnissen6y ago

sushshshsh6y ago

1 more reply

mxcrossb6y ago

cbkeller6y ago

[1] https://datacenterfrontier.com/amazon-approaches-1-gigawatt-...

[2] https://www.eenews.net/stories/1060048034

[3] https://sustainability.aboutamazon.com/sustainable-operation...

fhqghds6y ago

yyyeah... no.

sushshshsh6y ago

5 more replies

01acheru6y ago

(my two cents, I’m not into that field)

dekhn6y ago

From https://blogs.microsoft.com/ai/openai-azure-supercomputer/:

gnufx6y ago

dman6y ago· 7 in thread

Really wish Fujitsu sold a developer kit with an A64fx chip - its the only shipping ARM chip with SVE that I know of and I would love to get my hands on one to play with.

vt2406y ago

There are some architecture manuals on github to peruse.

https://github.com/fujitsu/A64FX

loudmax6y ago

No kidding!

timthorn6y ago

Something like the Fujitsu PRIMEHPC FX700?

1 more reply

gpderetta6y ago

> its the only shipping ARM chip with SVE

IIRC the extension was specifically designed by Fujitsu for their needs, so it is not surprising.

throwaway57926y ago

Not quite, although they were deeply involved for much of it.

gnufx6y ago

I'm pretty sure there's an emulator, which is how you usually do early development.

gnufx6y ago

See https://github.com/RIKEN-RCCS/riken_simulator

davidhyde6y ago· 7 in thread

The site of the organisation responsible for assessing the fastest computers in the world succumbs to the hacker news hug of death.

lorenzhs6y ago

Oh my, it's a django app with debug mode enabled. I just got an InterfaceError with the full traceback and django configuration. (I've emailed them so they can fix it)

Hamuko6y ago

Just how in the world are people deploying Django apps with DEBUG = True?

2 more replies

antishatter6y ago

Good work!

jbay8086y ago

To be fair, I wouldn't host my website on my world-class supercomputer either, if I had one...

3pt141596y ago

To be even fairer, I've served a shit ton of traffic on a small DigitalOcean droplet and never had issues because my stack is reasonable.

4 more replies

Florin_Andrei6y ago

Well, it's fastest number crunchers, not fastest file servers.

Pet_Ant6y ago

Ah, but did it do it quicker than the last version? That's the question.

KKKKkkkk16y ago· 7 in thread

rwmj6y ago

user59944616y ago

Google Cloud has a much superior interconnect, easily doing twice the bandwidth of AWS with lower latency.

Ethernet may not be approaching Infiniband in raw speed and latency, but I think it's doing pretty decently with 10 Gbps going to every node.

floatboth6y ago

AWS isn't just a couple gigabit switches, they do offer some HPC-oriented things: https://aws.amazon.com/hpc/efa/

gh02t6y ago

guenthert6y ago

I'd expect in the long run, there will be only few data center operators.

1 more reply

6AA4FD6y ago

I don't think more than half a percent of taxpayers know about top500, and I don't think many who know even care how their country ranks.

fomine36y ago

Side story: HPC ranking is relatively common topic in Japan because previous K computer was targeted to cut budget by government and well reported by news.

sadfev6y ago· 6 in thread

blopeur6y ago

A64fx have on board HMB -> that means no dram. If you look at the fugaku mother board their is no Dimm slots. All the memory is on the same package as the CPU.

This delivers a huge boost in bandwith.

HMB stand for high memory bandwidth. It offers up to 900 GB/s.

Now if you add the tofu interconnect on top you have a systems finely tuned for maximising data movement.

Remember : compute is cheap, communication is expensive.

You can have load of gpu and processors but if you can't feed them data fast enough they are useless.

blopeur6y ago

For a more in depth dive :

"Report on the Fujitsu Fugaku (富岳) System", Jack Dongarra, Jun 22, 2020

https://www.dropbox.com/s/aqntdb43p6so0z5/fugaku-report.pdf?...

ViralBShah6y ago

That is a pretty fun architecture. I hope that opens the door to higher performance for more workloads than top500.

lukevp6y ago

That's fascinating. I know that AMD has been touting HBM as a faster memory subsystem for their GPUs, is that the same as HMB where it's stacked? Or are they just calling it something similar?

2 more replies

gnufx6y ago

floatboth6y ago

TX2 is from a different company though (Cavium -> Marvell). I guess the "predecessor" of the A64FX would technically be some SPARC chip that Fujitsu used to build?

1 more reply

pininja6y ago· 4 in thread

The link is working for me, here are the details on the winner.

timClicks6y ago

Wow, so we've hit exascale. I heard claims that we would reach exascale by 2020 in 2012. I didn't believe them.

zekrioca6y ago

0.5 exascale

throwaway_pdp096y ago

There seems to be a difference between peak and useful. Be careful of marketing and hype.

(I am not an HPC guy, just IMO).

theaustinseven6y ago

gnufx6y ago· 4 in thread

The bad news for anyone potentially operating a similar system in a university, for instance, is that the typical nightmare proprietary software is apparently becoming available for it...

jabl6y ago

> However, the amount of memory/core is surprising

This is very similar to high-end GPU's which also use HBM2 memory, e.g. NVIDIA A100 has 40 GB.

guicho2718286y ago

FYI, https://github.com/fujitsu/A64FX

m_mueller6y ago

Maybe you mean Matsuoka sensei? He's the director of RIKEN AICS since a couple of years and a known media figure in Japan.

jabl6y ago

Satoshi Matsuoka is an "international rock star" in HPC circles. But I don't think he was involved with the K computer; before RIKEN he was IIRC at Tokyo Tech doing their "Tsubame" GPU clusters.

1 more reply

praveen99206y ago· 4 in thread

Pushing the boundaries is the best way to advance technology

In my opinion, next arm/intel will be the company who does energy efficient processors

jabl6y ago

> Heat management of these systems will become the major contributor for environmental pollution than all vehicle exhaust.

No, extra heat just radiates away into space, nothing to worry about. The problem is in the GHG emissions to generate the energy to run (and yes, cool) these systems.

> it makes sense to optimize energy consumed per processor.

Yes, absolutely. Even if you don't care about the environment, power consumption limits chip performance. Doing more work with less power is key to producing faster chips.

dawg-6y ago

>Heat management of these systems will become the major contributor for environmental pollution

Here's an idea, maybe we should just turn the moon into a giant data center?

giantrobot6y ago

In a vacuum the only way to dissipate heat is radiation and to a lesser extent conduction. In an atmosphere you also have convection. It's more efficient to cool something on Earth than on the Moon.

This is all besides the phenomenal cost of building data centers on the Moon vs Earth.

1 more reply

blueblisters6y ago

Meh building underwater makes more sense imo. Microsoft is already doing this: https://news.microsoft.com/features/under-the-sea-microsoft-...

hangonhn6y ago· 3 in thread

Anyone know the reason for the dominance of Power processors in the top 10 other than it's from IBM and they get a lot of contracts for HPC?

detaro6y ago

gnufx6y ago

CAPI

1 more reply

hydroreadsstuff6y ago

UI_at_80x246y ago· 3 in thread

I can't help but think that the top minds from Cyrix aren't feeling both smug at the vindication and dismayed that they were just a little too ahead of the curve.

The writing was on the wall that RISC would win, but the x86 juggernaut appeared unbeatable.

qayxc6y ago

> The writing was on the wall that RISC would win [...]

This whole "RISC vs CISC"-nonsense is quite inane, given that it's a design choice that's highly application-specific.

It's even debatable, whether the A64FX can even be considered a "pure" RISC design, considering the inclusion of SVE-512 and its 4 unspecified "assistant cores" [2] ...

[1] https://cloud.google.com/blog/products/gcp/an-in-depth-look-...

[2] https://www.fujitsu.com/jp/Images/20180821hotchips30.pdf

russler236y ago

axaxs6y ago

I remember Cyrix as just releasing Intel compatible/clone chips, am I misremembering something?

SomeoneFromCA6y ago· 3 in thread

I suggest building a supercomputer out of Ivy Bridge Celerons. They are dirty cheap, like $2 in bulk, yet quite performant.

m_mueller6y ago

For this kind of system the price of the processors isn't really a big focus.

zokier6y ago

Can you buy 10+ million units of those?

SomeoneFromCA6y ago

Finally someone who got the joke.

akamoonknight6y ago· 2 in thread

Is there information on how the Fugaku machines are connected together? The highest performing Power9 ones seem to use InfiniBand, but is that still true with the ARM devices?

Edit: seems to be a Fujitsu designed interconnect [0]. Wonder how much of the overall performance is dependent on the difference in communication.

https://www.fujitsu.com/global/documents/solutions/business-...

gnufx6y ago

I don't know whats the best reference, but here's one: https://www.fujitsu.com/global/Images/the-tofu-interconnect-...

akamoonknight6y ago

Ah, much better than the white paper, thanks!

Aaronstotle6y ago· 2 in thread

CoreTeks on youtube has a great video that explains the brilliance of this chip.

tandr6y ago

https://www.youtube.com/watch?v=IfHG7bj-CEI

That video is from end of 2018... which makes it even more amazing.

d_tr6y ago

Good video indeed. Extra points for featuring Satoru Iwata!

fizixer6y ago· 2 in thread

As I have said countless times in the past:

- Moore's law is dead at the level of the transistor

- Architecture, HPC updates will keep coming for many years into the future

- AGI has already escaped Moore's law (i.e., development of a fully functional AGI will not be constrained by lack of Moore's law progress). And that's what really matters.

AnimalMuppet6y ago

Why do you say that AGI has escaped Moore's law? Especially, how can you say so when you don't know what the right set of algorithms are?

fizixer6y ago

Oh then you're not going to like what I'm going to say next:

- AGI-in-the-basement scenario is very doable and has been or will be done, many times over.

1 more reply

ksec6y ago· 1 in thread

Well the link is dead. But I am guessing it is from Fujitsu A64FX, a 512 bit SIMD extension for ARM.

Edit: Turns out I was right. May be this link is better.

https://www.anandtech.com/show/15869/new-1-supercomputer-fuj...

floatboth6y ago

flyGuyOnTheSly6y ago· 1 in thread

There's something amusing about a blog post announcing the fastest computer in the world being unable to serve up web requests in under 10 seconds.

(When I wrote this comment, I was seeing 500 status code errors when trying to load the page)

glouwbug6y ago

True, but we can blame the programmer as we usually do

fomine36y ago· 1 in thread

Green500 #1 is MN-3 by PFN that's also from Japan and they also use original chip!

IanCutress6y ago

https://www.anandtech.com/show/15177/preferred-networks-a-50...

cinntaile6y ago· 1 in thread

lovemenot6y ago

Fujitsu announced they would be building a Kei2 based on ARM, soon after their Kei was #1 around 5 years ago.

ARM was a British company until it was bought by SoftBank a couple of years ago.

ausbah6y ago· 1 in thread

is x86 likely to ever go away, or anytime soon? asking as a non-systems guy

stjohnswarts6y ago

mtgx6y ago· 1 in thread

Also a king in efficiency:

https://www.nextplatform.com/2019/11/22/arm-supercomputer-ca...

rrss6y ago

The full scale supercomputer is not quite as efficient as the prototype.

calaphos6y ago

Even more impressive than the linpack result (2.8x faster than the runner up) is the HPCG result at 4.6x the result of summit in second place.

Symmetry6y ago

For those interested in more details they did a presentation at Hot Chips. The slides are here:

https://www.hotchips.org/hc30/2conf/2.13_Fujitsu_HC30.Fujits...

gok6y ago

d_tr6y ago

For what it's worth, Cray is also offering supercomputers with A64FX chips.

29athrowaway6y ago

Some more information about Fugaku, aka Post-K, here:

- Slides on Fugaku: https://www.fujitsu.com/global/Images/supercomputer-fugaku.p...

- SVE 512 instructions for armv8: https://www.fujitsu.com/global/Images/armv8-a-scalable-vecto...

mxcrossb6y ago

It’s sort of funny to watch the ISC event and see news of a machine with an exaop of AI performance, while the zoom presentation still can’t properly crop out the background.

cwaffles6y ago

Backup link if you're getting http 503: http://archive.is/JSvCi

IncRnd6y ago

http://archive.is/PNY80

ClarkMills6y ago

Ah but does it run Linux? [probably but couldn't see in the link] Looks like another nail in Intel's coffin...

j / k navigate · click thread line to collapse