AMD Prepares 32-Core Naples CPUs for 1P and 2P Servers: Coming in Q2 (opens in new tab)

(anandtech.com)

341 pointsBlackMonday9y ago160 comments

160 comments

80 comments · 23 top-level

throwawayish9y ago· 17 in thread

I think Naples is a very exciting development, because:

- 1S/2S is obviously where the pie is. Few servers are 4S.

- 8 DDR4 channels per socket is twice the memory bandwidth of 2011, and still more than LGA-36712312whateverthenumberwas

- First x86 server platform with SHA1/2 acceleration

- 128 PCIe lanes in a 1S system is unprecedented

All in all Naples seems like a very interesting platform for throughput-intensive applications. Overall it seems that Sun with it's Niagara-approach (massive number of threads, lots of I/O on-chip) was just a few years too early (and likely a few thousands / system to expensive ;)

semi-extrinsic9y ago

> 128 PCIe lanes in a 1S system is unprecedented

Yes, definitely drooling at this. Assuming a workload that doesn't eat too much CPU, this would make for a relatively cheap and hassle-free non-blocking 8 GPU @ 16x PCIe workstation. I wants one.

sorenjan9y ago

That does sound pretty spectacular, and really loud. What kind of case would you put that in? Would you work with ear protection?

1 more reply

AnthonyMouse9y ago

> 8 DDR4 channels per socket is twice the memory bandwidth of 2011, and still more than LGA-36712312whateverthenumberwas

This one will be interesting. The current Ryzen (like most of the Intel desktop range) has two channels, but everyone has been benchmarking it against the i7-6900K because they both have eight cores. The i7-6900K is the workstation LGA 2011 with four channels. If the workstation Ryzen will have eight channels...

gigatexal9y ago

Let's hope this isn't niagra again: it needs to have decent clock speeds as IPC is still worth something today. But yes, I totally agree, this is an exciting chip.

binarycrusader9y ago

It's not, not only did AMD move from CMT (clustered multi-thread) design used in the previous Bulldozer microarchitecture, they now have an SMT (simultaneous multithreading) architecture allowing for 2 threads per core.

By comparison, the performance of sparc substantially improved moving from the T1, T2 to T3+. The T1 used a round-robin policy to issue instructions from the next active thread each cycle, supporting up to 8 fine-grained threads in total. That made it more like a barrel processor.

Starting with the T3, two of the threads could be executed simultaneously. Then, starting with the T4, sparc added dynamic threading and out-of-order execution. Later versions are even faster and clock speeds have also risen considerably.

1 more reply

alimbada9y ago

Naples is based on Ryzen which, if you look at early benchmarks, is beating the competition on all fronts except gaming (suspectedly due to software optimisation and motherboard issues).

1 more reply

tyingq9y ago

32 cores in one socket may also take a bite out of some servers that are currently 2 sockets.

agumonkey9y ago

My shallow understanding of big servers and IBM Z series amounted to "lots of dedicated IO processors". Seems like "mainstream" caught up with big blue.

kev0099y ago

Sort of. It ebbs and flows, generally more maintainable to do more in CPU/kernel and less in HW/firmware for PCs and of course price runs the market so there's a race to do less. Part of the mainframe price tag is getting long term support on the whole system stack, whereas PC vendors actively abandon stuff after a few years. That is a big risk for something like TCP offload engine.

Every mainframe interface is basically an offload interface.. "computers" DMAing and processing to the CPs and each other. Every I/O device has a command processor, so it can handle channel errors and integrated pcie errors in a way PCs cannot.

A PC with Chelsio NICs doing TCP offload with direct data placement or RDMA as well as Fiber Channel storage would be mini/mainframe-ish.

PeCaN9y ago

Pretty much. Mainframes have been very I/O oriented from the start. Channel I/O (more or less DMA) with dedicated channel programs and processors can be very high-throughput.

1 more reply

mtgx9y ago

Intel doesn't have SHA2 acceleration? ARMv8 has had it for like 2-3 years now...

And AMD should dump SHA1 acceleration in the next generation.

drzaiusapelord9y ago

>And AMD should dump SHA1 acceleration in the next generation.

The cost to have that on silicon is probably close to zero. If you think SHA1 is just going to magically disappear because you want it to, well, you'll be in for a SHA1 sized surprise. Our grandkids will still have SHA1 acceleration.

>ARMv8 has had it for like 2-3 years now...

Because ARM cores don't remotely have the CPU heft an Intel x86/64 chip has, so ARM needs all this acceleration because its typically used in very low power mobile scenarios. On top of that, Intel claims AES-NI can be used to accelerate SHA1.

https://software.intel.com/en-us/articles/improving-the-perf...

1 more reply

throwawayish9y ago

ARM cores are much weaker, crypto performance without NEON is absymal across the board. Of course, compared to hardware-acceleration software always seems slow; Haswell manages AES-OCB at <1 cpb.

yuhong9y ago

As a side note, XOP had rotate instructions. Sadly it is no longer supported in Ryzen.

tw049y ago

Intel hass had SHA1/2 acceleration for YEARS via the AES-NI instruction set.

https://en.wikipedia.org/wiki/Intel_SHA_extensions

>There are seven new SSE-based instructions, four supporting SHA-1 and three for SHA-256:

>SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2, SHA256RNDS2, SHA256MSG1, SHA256MSG2

throwawayish9y ago

This is not part of AES-NI and has never been released in a mid-range+ server/desktop CPU, only part of some Atom parts (Goldmont). Therefore software support is poor (I think OpenSSL does not support it). It is said to be included in 2018+ Cannonlake, though.

floatboth9y ago

haha nope. This is not a part of AES-NI.

The only processors so far with these extensions are low power Goldmont chips.

https://github.com/weidai11/cryptopp/issues/139

1 more reply

rl39y ago· 7 in thread

In previous threads there was discussion about Intel processors, specifically Skylake (which is a desktop processor), being superior for server workloads involving vectorization.

How will Naples fare on this front?

quickben9y ago

That front remains to be seen. However, 128 lanes, 8 channel ram; It will make a mess out of Intel in the vm hosting arena.

I'm glad I don't own any Intel stock atm :)

greggyb9y ago

The VM hosting arena is exactly where cloud providers play.

A high core count, energy efficient CPU with IO out the wazoo?

I'm happy I bought AMD stock over the summer (:

astrodust9y ago

Outside of specialized workloads, not a lot of software is vectorized. Maybe your database server can take advantage, but your application server will probably not benefit one bit.

wtallis9y ago

Desktop Skylake doesn't support AVX-512. Server Skylake will, when it ships. (The Xeon E3 v5 doesn't, because it's the same chip as desktop Skylake.)

rl39y ago

Removed the incorrect information from my post. Thanks for the correction.

sp3329y ago

Naples might not fare well, but AMD is betting on vector operations being offloaded to a GPU-like accelerator connected via Infinity Fabric.

Tuna-Fish9y ago

Badly, but it doesn't matter because it's still just a tiny portion of the market.

andy_ppp9y ago· 7 in thread

How well does, say, Postgres scale on such hardware? Is anything more that 8 cores overkill or can we assume good linear increases in queries per second...

eis9y ago

Depends on your queries. I am looking at a server right now that uses 80% of 32 cores with Postgres 9.6. It's doing lots of upserts and small selects. Averages 76k transactions per second. I think it could easily take advantage of a 64 core system.

The main scalability issue I have with Postgres is its horrible layout of data pages on disk. You can't order rows to be layed out on disk according to primary key. You can CLUSTER the table every now and then but that's not really practical for most production loads.

koolba9y ago

I think I saw a proposal recently for something that would cover this use case. IIRC it was for an index organized table that stores the entire contents in a btree (so it would naturally be stored in primary key order).

I don't think there's been any work on it yet though.

1 more reply

pg3149y ago

Have you looked into pg_repack [1]? It's a PostgreSQL extension that can CLUSTER online, without holding an exclusive lock. I haven't used it, but it looks interesting as an alternative to the built-in CLUSTER.

[1] http://reorg.github.io/pg_repack/

brianwawok9y ago

This is from 2012: http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-abo...

My guess is the 1 socket options scales great. 2 sockets are are less than ideal, and you will not double the 1 socket performance.

Tostino9y ago

Yeah, every single release since then has had work done on scaling Postgres up on a single machine. They have been working on eliminating bottlenecks one after another to allow it to scale on crazy numbers of cores.

I've seen benchmarks on the -hackers mailing list with 88 core Intel servers (4s 22c) in regard to eliminating bottlenecks when you have that many cores. So even if it's not 100% there yet, it will be soon.

mozumder9y ago

I'd like to see this data on Postgres scaling updated, with more info on the write scaling as well. (the chart appears to be for SELECT queries only)

2 more replies

anarazel9y ago

I've seen decent enough scaling on 8 socket servers. There's still some bottlenecks if you have a lot of very short queries (because there's some shared state manipulations per query & transaction), but in a lot of cases postgres scales quite well even at those sizes.

arca_vorago9y ago· 6 in thread

This is what I have really been looking forward to. I theorycrafted a more ideal system for the genetics work a former employer was doing, but didn't get to build it until after I had left there. A quad 16 core opteron system for a total of 64 cores (for physics calculations in comsol). I think that there is more potential use for high actual core count servers than many people realize, so I can't wait to build one. (for my purposes these days is as an game server in a colo, one of my projects is a multiplayer UE4 game)

At the previous job where I built the 64-core system, I even emailed the AMD marketing department to see if we could do some PR campaign together, but I think it was too soon before the Naples drop, because I never got a response. Here's to hoping supermicro does a 4 cpu board for this... 124 cores would be amazing. (But I'll take 64 naples cores as long as it gets rid of the bugs and issues I found with the opterons).

deepnotderp9y ago

Out of curiosity, I thought that genetics was the domain of gpus?

kannanvijayan9y ago

I did sequence-based bioinformatics back around 2006 or so.

Very few of the operations used GPU. Things may have changed since I was working there, but the work at the time wasn't suited for a GPU architecture.

Initial step was sequence cleanup, which is a hidden markov model executed over a collection of sequences of varying length, so hard to parallelize. Sequence annotation is embarassingly parallel on a per-library basis (each sequence can be annotated independently of the other), but the computational work is fuzzy string matching, which is once again hard to GPU-ize. Another major computational job was contig assembly, which is somewhat parallelizable (pairwise sequence comparisons), but once again involves fuzzy string matching so not GPU-izable.

So that's just sequence genetics. Don't know if GPUs are used in other areas.

Lots of cores, lots of threads, and lots of main memory. That was the key.

1 more reply

arca_vorago9y ago

I think there is plenty of room for GPU usage in bioinformatics, but there are some barriers that prevent it from gaining prevelance, such as cost vs cpus, and lack of updates (example, gpu-blast is still 1.1, blast is 1.2).

yread9y ago

For most of the really time consuming steps the speedup isn't spectacular, 1.6x is not worth the effort

http://ce-publications.et.tudelft.nl/publications/1520_gpuac...

CreRecombinase9y ago

It's quite rare to find GPUs being used in genetics.

2 more replies

sorenjan9y ago

Wouldn't Xeon phi be a good choice for that kind of usage?

daemonk9y ago· 4 in thread

Nice. This is the more interesting market for AMD rather than the gaming market in my opinion. 128 PCIe lanes and up to 4TB of ram will be awesome.

ptrptr9y ago

Gaming? More like consumer market, Ryzen 7 is definitely not suited for gamers, advertising it as such was IMO mistake. Nevertheless Naples can be big innovation in server segment.

Also what with ECC? Ryzen can support it or not?

mrb9y ago

"Ryzen 7 is definitely not suited for gamers"

The underperformance in gaming was tracked down to software issues according to AMD. Namely:

- bugs in the Windows process scheduler (scheduling 2 threads on same core, and moving threads across CPU complexes which loses all L3 cache data since each CCX has its own cache)

- buggy BIOS accidentally disabling Boost or the High Performance mode (feature that lets the processor adjust voltage and clock every 1 ms instead of every 40 ms.)

- games containing Intel-optimized code

More info: http://wccftech.com/amd-ryzen-launch-aftermath-gaming-perfor...

Furthermore hardcore gamers usually play at 1440p or higher in which case there is no difference in perf between Intel or AMD, as demonstrated by the many benchmarks (because the GPU is always the bottleneck at such high resolutions.)

3 more replies

floatboth9y ago

Not being the top single-threaded performer which is required to push many many hundreds of frames per second != "not suited for gamers". Games in general are more likely to be GPU-bound!! Intel's quad cores are only really required for the pro Counter-Strike players who want 600fps at 1080p just to get the absolute latest frame.

BTW they advertised it as good for gaming + streaming (h264 CPU encoding at the same time on the same machine). And "content creation", which pretty much always means video editing.

IIRC Ryzen supports unbuffered ECC if the mainboard supports it.

2 more replies

alimbada9y ago

It's just as suited for gaming as it is for anything else. The problem is everyone expected all games to run buttery smooth on day one with no hiccups. Ryzen specific game engine optimisations are coming according to AMD, as well as a Windows 10 scheduler patch. There are also other issues on the motherboard/BIOS side which manufacturers are working on.

1 more reply

kiddico9y ago· 3 in thread

Sorry, my google-fu isn't on point today; what's the difference between 1p and 1u. or 2p and 2u? My nomenclature knowledge is lacking ...

sp3329y ago

P = Processor and S = Socket (they're pretty interchangeable). U = rack Unit https://en.wikipedia.org/wiki/Rack_unit

throwawayish9y ago

n-P / n-S / n-way = how many sockets/processors a system has. A 1S system has one socket / processor, a 2S system two, a 4S four and so on.

x U (or x HE, if you're talking with a German manufacturer, they like to make that mistake ... ;) are rack-units, i.e. how large the case is.

astrodust9y ago

The title should be re-written to say "single and dual socket" not "1P and 2P".

Coding_Cat9y ago· 3 in thread

With how big these chips are getting, I wonder if the next iteration will have an HBM last-level cache on chip.

phkahler9y ago

That's the old EHP concept.

http://wccftech.com/amd-exascale-heterogeneous-processor-ehp...

I'd like to have that in the old project quantum package: http://wccftech.com/amd-project-quantum-not-dead-zen-cpu-veg...

That would be a TFLOPS level supercomputer on your desk.

keth9y ago

Here is the newest PDF about something like that: http://www.computermachines.org/joe/publications/pdfs/hpca20...

throwawayish9y ago

"IBM did it first"

Well not with HBM (which is DRAM), but huge amounts of L3 SRAM on a MCM... POWER5 I believe.

drewg1239y ago· 2 in thread

The important thing here, from my perspective, is how NUMA-ish a single socket configuration will be. According to the article, a single package is actually made up of 4 dies, each with its own memory (and presumably cache hierarchy, etc). While trivially parallelizable workloads (like HPC benchmarks) scale quite well regardless of system topology, not all workloads do so. And teaching kernel schedulers about 2 levels of numa affinity may not be trivial.

With that say, I'm looking forward to these systems.

wtallis9y ago

Intel's largest CPUs are already explicitly NUMA on a single socket. They call it Cluster On Die: http://images.anandtech.com/doci/10401/03%20-%20Architectura...

drewg1239y ago

Very true, I should have mentioned that. At least for us, COD doesn't seem to impact our performance at all, while NUMA does. I'm hoping that Naples is the same for us.

However, there is an important difference. AMD seems to be putting multiple dies into the same package, whereas Intel seems to have (as the Cluster on Die name implies) everything on the same die. So my fear is that the interconnect between dies may not be fast enough to paper-over our NUMA weaknesses.

1 more reply

mtgx9y ago· 2 in thread

If they have a much better performance/$ than Intel, which they likely will have, it sounds like a good opportunity for AWS to significantly undercut Microsoft and Google (which recently bragged about purchasing expensive Skylake-E chips).

chx9y ago

There's opportunity cost to consider. Google has Skylake-E now which is not even available at retail yet.

mtgx9y ago

Well, it also seems that Intel prioritized its customers. If I were Amazon or Microsoft (the rumors said Google and Facebook were the priority customers), I would get Naples just to spite Intel (it doesn't hurt that AMD's Naples likely offers better perf/$, too, though):

https://semiaccurate.com/2016/11/17/intel-preferentially-off...

2 more replies

Symmetry9y ago· 2 in thread

Semi-ironically this looks like just the thing to use in a supercomputer controlling a good number of NVidia GPUs.

gbrown_9y ago

Was thinking the same thing. Like the CPU marked it's good to have competition with GPUs but it would be interesting if Nvidia picked up/ partnered with AMD. Oh well let's see how OpenPOWER pans out.

PeCaN9y ago

What would be really interesting if AMD CPUs and GPUs both support their Infinity Fabric concept. Heterogeneous systems with high-performance direct memory access is a huge deal.

emcrazyone9y ago· 2 in thread

can anyone chime in as to why use PCIe over something more core to core direct? As I understand it, the CPU still needs to talk to a PCIe host/bridge controller. Why not have something that is more direct between processors?

sliken9y ago

Hypertransport is an AMD technology that's high bandwidth per line, low latency, and scalable. It's also cache-coherent (well there's a version that is), so it's great for connecting CPUs. But the AMD hardware is flexible and can use the same pins for either.

So the single socket systems can have more pci-e lanes available, but the dual socket has less per socket because some of those lanes are used for hypertransport.

What I can't figure out is why Intel and AMD aren't using similar (Hypertransport for AMD and QPI for intel) to connect directly to GPUs in a cache coherent way. These days the faster interconnects spend a decent fraction of their latency just getting across the PCI-e bus twice.

So 100 Gbit networks, Infiniband, GPUs, etc all could take advantage of a lower latency cache coherent interface, but it's not available.

I suspect mainly because qpi and hypertransport are incompatible and pci-e is good enough for the high volume cases.

jabl9y ago

Well, AMD is one of the founding members of OpenCAPI, http://opencapi.org/ , so I guess there's some hope. It seems they haven't talked about it wrt Zen/Naples, maybe some later iteration will have it?

ksec9y ago· 1 in thread

1. Most of the benchmarks are not even compiled or made with Zen Optimization in mind. But the results are already promising, or even Surprising.

2. Compared to Desktop / Windows Ecosystem, their are much more Open Source Software on the Server side, along with usual Open Source Compiler. Which means any AMD Zen optimization will be far easier to deploy compared to Games and App on Desktop coded and compiled with Intel / ICC.

3. The sweet spot for Server Memory is still at 16GB DIMMs. A 256GB Memory for your caching needs or In-memory Database will now be much cheaper.

4. When are we going to get much cheaper 128GB DIMM Memory? Fitting 2TB Memory per Socket, and 4TB per U, along with 128 lanes for NVM-E SSD Storage, the definition of Big Data, just grown a little bigger.

5. Between now and 2020, the roadmap has Zen+ and 7nm. Along with PCI-E 4.0. I am very excited!

keth9y ago

> 5. Between now and 2020, the roadmap has Zen+ and 7nm. Along with PCI-E 4.0. I am very excited!

Yes, and it's rumored that the top end 7nm chip will be 48 cores (codename starship). Exciting times ahead now that the competition is back.

deelowe9y ago· 1 in thread

This is when things will get interesting. Ryzen appears to do better with hot and server workloads than gaming.

deelowe9y ago

Should read HPC instead of "hot"

keth9y ago

I'm looking forward to the benchmarks since the performance per watt of the desktop parts (Ryzen R7) seems to be really good. Quite curious how it will compare against Skylake-EP.

A quote from a anandtech forum post [0] reads promising:

"850 points in Cinebench 15 at 30W is quite telling. Or not telling, but absolutely massive. Zeppelin can reach absolutely monstrous and unseen levels of efficiency, as long as it operates within its ideal frequency range."

A comparison against a Xeon D at 30W would be interesting.

The possibility of this monster maybe coming out sometime in the future is also quite nice: http://www.computermachines.org/joe/publications/pdfs/hpca20...

[0] https://forums.anandtech.com/threads/ryzen-strictly-technica...

deepnotderp9y ago

I've long been advocating for a high i/o cpu with several pcie lanes. 128 lanes will support 8 GPUs at max bandwidth. AMD has positioned itself well.

ajaimk9y ago

This is the first I'm reading about the 32 cores being 4 dies on a package - Not sure how well that will work out in practice. IBM does something similar with Power servers where 2 dies on a package are used for lower end chips.

Basically, using multiple dies increases latency significantly between the cores on different dies. This will affect performance. I will not judge till I see the benchmark though :-)

Demcox9y ago

Just having one of those in a workstation get me all warm and fuzzy.

HippoBaro9y ago

I think Naples will be a very serious threat to Intel in the server market. As Ryzen benchmarks & reviews have shown, Zen really shines in heavy-multithreaded applications. The typical workload of a server.

Though I am kind of worried concerning memory access. Latency penalties when accessing non-local memory are very high on Zen CPUs due to the multi-die architecture design.

Does that mean we will finally see some serious interest in Shared-Nothing design and alike in the future ?

galeos9y ago

This is a multi-chip-module (MCM). Are the high core-count Xeons now all single die? Will be interesting to see what impact the MCM approach has on benchmarks as I supposed could have a latency impact in certain use cases?

m3kw99y ago

In other words, we have a faster server chip coming

rosege9y ago

Licensing Windows 2016 Datacenter would cost a fortune for the 2P server.

__mp9y ago

I'm wondering how they will stack up against XeonPhi.

hossbeast9y ago

How feasible will a Naples desktop build be?

j / k navigate · click thread line to collapse

160 comments

80 comments · 23 top-level

throwawayish9y ago· 17 in thread

I think Naples is a very exciting development, because:

- 1S/2S is obviously where the pie is. Few servers are 4S.

- 8 DDR4 channels per socket is twice the memory bandwidth of 2011, and still more than LGA-36712312whateverthenumberwas

- First x86 server platform with SHA1/2 acceleration

- 128 PCIe lanes in a 1S system is unprecedented

semi-extrinsic9y ago

> 128 PCIe lanes in a 1S system is unprecedented

Yes, definitely drooling at this. Assuming a workload that doesn't eat too much CPU, this would make for a relatively cheap and hassle-free non-blocking 8 GPU @ 16x PCIe workstation. I wants one.

sorenjan9y ago

That does sound pretty spectacular, and really loud. What kind of case would you put that in? Would you work with ear protection?

1 more reply

AnthonyMouse9y ago

> 8 DDR4 channels per socket is twice the memory bandwidth of 2011, and still more than LGA-36712312whateverthenumberwas

gigatexal9y ago

Let's hope this isn't niagra again: it needs to have decent clock speeds as IPC is still worth something today. But yes, I totally agree, this is an exciting chip.

binarycrusader9y ago

1 more reply

alimbada9y ago

Naples is based on Ryzen which, if you look at early benchmarks, is beating the competition on all fronts except gaming (suspectedly due to software optimisation and motherboard issues).

1 more reply

tyingq9y ago

32 cores in one socket may also take a bite out of some servers that are currently 2 sockets.

agumonkey9y ago

My shallow understanding of big servers and IBM Z series amounted to "lots of dedicated IO processors". Seems like "mainstream" caught up with big blue.

kev0099y ago

A PC with Chelsio NICs doing TCP offload with direct data placement or RDMA as well as Fiber Channel storage would be mini/mainframe-ish.

PeCaN9y ago

Pretty much. Mainframes have been very I/O oriented from the start. Channel I/O (more or less DMA) with dedicated channel programs and processors can be very high-throughput.

1 more reply

mtgx9y ago

Intel doesn't have SHA2 acceleration? ARMv8 has had it for like 2-3 years now...

And AMD should dump SHA1 acceleration in the next generation.

drzaiusapelord9y ago

>And AMD should dump SHA1 acceleration in the next generation.

>ARMv8 has had it for like 2-3 years now...

https://software.intel.com/en-us/articles/improving-the-perf...

1 more reply

throwawayish9y ago

ARM cores are much weaker, crypto performance without NEON is absymal across the board. Of course, compared to hardware-acceleration software always seems slow; Haswell manages AES-OCB at <1 cpb.

yuhong9y ago

As a side note, XOP had rotate instructions. Sadly it is no longer supported in Ryzen.

tw049y ago

Intel hass had SHA1/2 acceleration for YEARS via the AES-NI instruction set.

https://en.wikipedia.org/wiki/Intel_SHA_extensions

>There are seven new SSE-based instructions, four supporting SHA-1 and three for SHA-256:

>SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2, SHA256RNDS2, SHA256MSG1, SHA256MSG2

throwawayish9y ago

floatboth9y ago

haha nope. This is not a part of AES-NI.

The only processors so far with these extensions are low power Goldmont chips.

https://github.com/weidai11/cryptopp/issues/139

1 more reply

rl39y ago· 7 in thread

In previous threads there was discussion about Intel processors, specifically Skylake (which is a desktop processor), being superior for server workloads involving vectorization.

How will Naples fare on this front?

quickben9y ago

That front remains to be seen. However, 128 lanes, 8 channel ram; It will make a mess out of Intel in the vm hosting arena.

I'm glad I don't own any Intel stock atm :)

greggyb9y ago

The VM hosting arena is exactly where cloud providers play.

A high core count, energy efficient CPU with IO out the wazoo?

I'm happy I bought AMD stock over the summer (:

astrodust9y ago

Outside of specialized workloads, not a lot of software is vectorized. Maybe your database server can take advantage, but your application server will probably not benefit one bit.

wtallis9y ago

Desktop Skylake doesn't support AVX-512. Server Skylake will, when it ships. (The Xeon E3 v5 doesn't, because it's the same chip as desktop Skylake.)

rl39y ago

Removed the incorrect information from my post. Thanks for the correction.

sp3329y ago

Naples might not fare well, but AMD is betting on vector operations being offloaded to a GPU-like accelerator connected via Infinity Fabric.

Tuna-Fish9y ago

Badly, but it doesn't matter because it's still just a tiny portion of the market.

andy_ppp9y ago· 7 in thread

How well does, say, Postgres scale on such hardware? Is anything more that 8 cores overkill or can we assume good linear increases in queries per second...

eis9y ago

koolba9y ago

I don't think there's been any work on it yet though.

1 more reply

pg3149y ago

[1] http://reorg.github.io/pg_repack/

brianwawok9y ago

This is from 2012: http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-abo...

My guess is the 1 socket options scales great. 2 sockets are are less than ideal, and you will not double the 1 socket performance.

Tostino9y ago

mozumder9y ago

I'd like to see this data on Postgres scaling updated, with more info on the write scaling as well. (the chart appears to be for SELECT queries only)

2 more replies

anarazel9y ago

arca_vorago9y ago· 6 in thread

deepnotderp9y ago

Out of curiosity, I thought that genetics was the domain of gpus?

kannanvijayan9y ago

I did sequence-based bioinformatics back around 2006 or so.

Very few of the operations used GPU. Things may have changed since I was working there, but the work at the time wasn't suited for a GPU architecture.

So that's just sequence genetics. Don't know if GPUs are used in other areas.

Lots of cores, lots of threads, and lots of main memory. That was the key.

1 more reply

arca_vorago9y ago

yread9y ago

For most of the really time consuming steps the speedup isn't spectacular, 1.6x is not worth the effort

http://ce-publications.et.tudelft.nl/publications/1520_gpuac...

CreRecombinase9y ago

It's quite rare to find GPUs being used in genetics.

2 more replies

sorenjan9y ago

Wouldn't Xeon phi be a good choice for that kind of usage?

daemonk9y ago· 4 in thread

Nice. This is the more interesting market for AMD rather than the gaming market in my opinion. 128 PCIe lanes and up to 4TB of ram will be awesome.

ptrptr9y ago

Gaming? More like consumer market, Ryzen 7 is definitely not suited for gamers, advertising it as such was IMO mistake. Nevertheless Naples can be big innovation in server segment.

Also what with ECC? Ryzen can support it or not?

mrb9y ago

"Ryzen 7 is definitely not suited for gamers"

The underperformance in gaming was tracked down to software issues according to AMD. Namely:

- bugs in the Windows process scheduler (scheduling 2 threads on same core, and moving threads across CPU complexes which loses all L3 cache data since each CCX has its own cache)

- buggy BIOS accidentally disabling Boost or the High Performance mode (feature that lets the processor adjust voltage and clock every 1 ms instead of every 40 ms.)

- games containing Intel-optimized code

More info: http://wccftech.com/amd-ryzen-launch-aftermath-gaming-perfor...

3 more replies

floatboth9y ago

BTW they advertised it as good for gaming + streaming (h264 CPU encoding at the same time on the same machine). And "content creation", which pretty much always means video editing.

IIRC Ryzen supports unbuffered ECC if the mainboard supports it.

2 more replies

alimbada9y ago

1 more reply

kiddico9y ago· 3 in thread

Sorry, my google-fu isn't on point today; what's the difference between 1p and 1u. or 2p and 2u? My nomenclature knowledge is lacking ...

sp3329y ago

P = Processor and S = Socket (they're pretty interchangeable). U = rack Unit https://en.wikipedia.org/wiki/Rack_unit

throwawayish9y ago

n-P / n-S / n-way = how many sockets/processors a system has. A 1S system has one socket / processor, a 2S system two, a 4S four and so on.

x U (or x HE, if you're talking with a German manufacturer, they like to make that mistake ... ;) are rack-units, i.e. how large the case is.

astrodust9y ago

The title should be re-written to say "single and dual socket" not "1P and 2P".

Coding_Cat9y ago· 3 in thread

With how big these chips are getting, I wonder if the next iteration will have an HBM last-level cache on chip.

phkahler9y ago

That's the old EHP concept.

http://wccftech.com/amd-exascale-heterogeneous-processor-ehp...

I'd like to have that in the old project quantum package: http://wccftech.com/amd-project-quantum-not-dead-zen-cpu-veg...

That would be a TFLOPS level supercomputer on your desk.

keth9y ago

Here is the newest PDF about something like that: http://www.computermachines.org/joe/publications/pdfs/hpca20...

throwawayish9y ago

"IBM did it first"

Well not with HBM (which is DRAM), but huge amounts of L3 SRAM on a MCM... POWER5 I believe.

drewg1239y ago· 2 in thread

With that say, I'm looking forward to these systems.

wtallis9y ago

Intel's largest CPUs are already explicitly NUMA on a single socket. They call it Cluster On Die: http://images.anandtech.com/doci/10401/03%20-%20Architectura...

drewg1239y ago

Very true, I should have mentioned that. At least for us, COD doesn't seem to impact our performance at all, while NUMA does. I'm hoping that Naples is the same for us.

1 more reply

mtgx9y ago· 2 in thread

chx9y ago

There's opportunity cost to consider. Google has Skylake-E now which is not even available at retail yet.

mtgx9y ago

https://semiaccurate.com/2016/11/17/intel-preferentially-off...

2 more replies

Symmetry9y ago· 2 in thread

Semi-ironically this looks like just the thing to use in a supercomputer controlling a good number of NVidia GPUs.

gbrown_9y ago

Was thinking the same thing. Like the CPU marked it's good to have competition with GPUs but it would be interesting if Nvidia picked up/ partnered with AMD. Oh well let's see how OpenPOWER pans out.

PeCaN9y ago

What would be really interesting if AMD CPUs and GPUs both support their Infinity Fabric concept. Heterogeneous systems with high-performance direct memory access is a huge deal.

emcrazyone9y ago· 2 in thread

sliken9y ago

So the single socket systems can have more pci-e lanes available, but the dual socket has less per socket because some of those lanes are used for hypertransport.

So 100 Gbit networks, Infiniband, GPUs, etc all could take advantage of a lower latency cache coherent interface, but it's not available.

I suspect mainly because qpi and hypertransport are incompatible and pci-e is good enough for the high volume cases.

jabl9y ago

ksec9y ago· 1 in thread

1. Most of the benchmarks are not even compiled or made with Zen Optimization in mind. But the results are already promising, or even Surprising.

3. The sweet spot for Server Memory is still at 16GB DIMMs. A 256GB Memory for your caching needs or In-memory Database will now be much cheaper.

5. Between now and 2020, the roadmap has Zen+ and 7nm. Along with PCI-E 4.0. I am very excited!

keth9y ago

> 5. Between now and 2020, the roadmap has Zen+ and 7nm. Along with PCI-E 4.0. I am very excited!

Yes, and it's rumored that the top end 7nm chip will be 48 cores (codename starship). Exciting times ahead now that the competition is back.

deelowe9y ago· 1 in thread

This is when things will get interesting. Ryzen appears to do better with hot and server workloads than gaming.

deelowe9y ago

Should read HPC instead of "hot"

keth9y ago

I'm looking forward to the benchmarks since the performance per watt of the desktop parts (Ryzen R7) seems to be really good. Quite curious how it will compare against Skylake-EP.

A quote from a anandtech forum post [0] reads promising:

A comparison against a Xeon D at 30W would be interesting.

The possibility of this monster maybe coming out sometime in the future is also quite nice: http://www.computermachines.org/joe/publications/pdfs/hpca20...

[0] https://forums.anandtech.com/threads/ryzen-strictly-technica...

deepnotderp9y ago

I've long been advocating for a high i/o cpu with several pcie lanes. 128 lanes will support 8 GPUs at max bandwidth. AMD has positioned itself well.

ajaimk9y ago

Basically, using multiple dies increases latency significantly between the cores on different dies. This will affect performance. I will not judge till I see the benchmark though :-)

Demcox9y ago

Just having one of those in a workstation get me all warm and fuzzy.

HippoBaro9y ago

Though I am kind of worried concerning memory access. Latency penalties when accessing non-local memory are very high on Zen CPUs due to the multi-die architecture design.

Does that mean we will finally see some serious interest in Shared-Nothing design and alike in the future ?

galeos9y ago

m3kw99y ago

In other words, we have a faster server chip coming

rosege9y ago

Licensing Windows 2016 Datacenter would cost a fortune for the 2P server.

__mp9y ago

I'm wondering how they will stack up against XeonPhi.

hossbeast9y ago

How feasible will a Naples desktop build be?

j / k navigate · click thread line to collapse