IBM doubles its 14nm EDRAM density, adds hundreds of megabytes of cache (opens in new tab)

(fuse.wikichip.org)

202 pointsinsulanian6y ago77 comments

77 comments

48 comments · 5 top-level

baybal26y ago· 23 in thread

I wonder if it will ever see a chance to go mainstream.

The memory bottleneck is pretty much the only thing in CPU design that didn't see a dramatic improvement over the years. Its elimination is the only obvious improvement pathway still left with expectation of double digit performance gains.

So we need either very big and very fast caches, or extremely wide and low latency memory. Both options are quite costly.

Adding on die DRAM that can work at least as fast as 500mhz will surely require some specialty process with a lot of compromises like the one in the article.

Gluing something like HBM2 to the die for a second option moves the cost from the specialty process to the specialty packaging. Not much better.

dfox6y ago

The reason why mainstream DRAM interfaces are narrow and “slow” is that you need row-at-a-time access patterns to really saturate the interconnect which is something that does not happen for general purpose workloads and causing such access patterns requires large caches which by themselves solve the issue and then also physical package pins and pad structures are one of the most expensive things in semiconductor design.

In end the DRAM array is bunch of analog magic and the interface works by copying the row you want into SRAM buffer on the chip which you then can access however you want. And the slowest operation in all that are the copies between SRAM row buffer and the actual DRAM array. (what I call SRAM buffer is usually called “column sense amplifiers”, but for the highlevel view it in fact is surprisingly wide array of 6T SRAM flipflops and some analog magic)

hinkley6y ago

So how many levels of cache do we have now between the ALU and the memory cell of record??

1 more reply

ChuckMcM6y ago

I beg to differ, the thing that make the original AMD64 chips so freakishly awesome was multiple memory controllers in a multi-socket system. That forced Intel to do something kinda similar but the Opteron line and then the Ryzen and EPYC lines have always been better than Intel in terms of raw memory bandwidth[1].

And when there was a clear need for the bandwidth, as in GPUs, circuit designers stepped up and created some amazing wide and deep memory bus architectures.

I would not be surprised to see AMD partner with IBM to utilize some of their EDRAM tech to keep themselves out ahead of Intel in the data center space. The more ways they can distinguish themselves, the more pressure they put on Intel's design teams.

[1] Yes, lookaside buffers and page tables in the Opterons took some of that advantage back, but it was still substantial.

jiggawatts6y ago

> Gluing something like HBM2 to the die for a second

This is something AMD is already doing for EPYC, and they've already used HBM2 in their GPUs.

So I'm surprised they haven't released any CPU models with crazy huge L4 caches using a few GB of HBM2.

Then again, Intel made a laptop CPU with a huge 128MB cache and their comment was that it didn't make that big of a difference. I believe the performance boost was less than 5% for going from 64MB to 128MB.

baybal26y ago

Its all about how fast the memory is.

960MiB is prodigiously large for such a microscopic chip, but if it "only" gain 3-5 times latency reduction over external DRAM, it's still very far from a proper L3 implementation, and far behind L2.

Make DRAM work on 1Ghz+, and then you will see miracles. Imagine a fully synchronous on-die DRAM that can sit just behind L1, or even be connected to load registers directly.

The problem is that effective frequencies for memory round-trip haven't got up much since nineties. If you work with 100% cache misses, your mem will still be working at effective frequency of around 100 to 200Mhz

2 more replies

hinkley6y ago

Maybe there’s an inflection point where imperative management of the cache is more effective than heuristic management.

2 more replies

vvanders6y ago

Read access patterns matter more than cache sizes, triple digit improvements are possible of you have linear reads.

rbanffy6y ago

Late Xeon Phis had 16GB of HBM around the die. It could either complement or mirror main memory.

toohotatopic6y ago

Hasn't Intel bought the company that was on the brink of producing those CPU-memory combos? Unfortunately, I haven't been able to find the name of the company or an article about it.

1 more reply

hinkley6y ago

With all of the speculation security bugs in chip cache management, I can’t help but wonder if we won’t eventually go full NUMA and turn the cache memory (or at least L2+) into a directly addressable space, either by the kernel or directly by application code. At which point your working set is explicitly on the processor, instead of implicitly.

I also wonder if chiplets will be the vehicle by which this comes to pass.

the84726y ago

Cache can already be made directly addressable (cache-as-ram mode). But outside of some niche solutions that's mostly done during boot due to the limitations of that mode and cache being too valuable.

deepnotderp6y ago

Commodity DRAM latency is mostly array line dominated, not due to proximity/distance, see eg https://ieeexplore.ieee.org/document/6522354

Also eDRAM is difficult to scale

moring6y ago

What is the state of adding higher-level logic to the DRAM chips? E.g. clear a whole DRAM row to 0 (blank memory for new processes). With today's processes, we can probably fit whole simple processor cores there easily.

imtringued6y ago

This is called PIM (processing in memory) and UPMEM [0] has a working implementation.

[0] https://www.anandtech.com/show/14750/hot-chips-31-analysis-i...

baybal26y ago

Latest DDR standard, and supposedly HBM3 in making support range requests

1 more reply

kev0096y ago

The same process tech is used on POWER9, and a lot of the physical design is shared between z and p, so this eDRAM is pretty mainstream in POWER.

OpenCAPI seems like it will provide almost as fast (latency) as an on die controller with DDR4 attached RAM at much cheaper prices than contemporary designs.. and opens the door to 3rd party innovation.

The real issue is trading latency versus bandwidth.. IBM's scale up designs have had ridiculously large memory bandwidth forever, but latency suffers to some extent with buffers like these SCs in the article. Whether that matters for your workload really depends.. but people need to break out of lock/latches mindset and use safe memory reclamation techniques like RCU and EBR to avoid common critical sections for lifecycle management, and in general minimize synchronization to make full use of current designs even like the now common AMD Rome.

I firmly believe we've been in a time where better programmers aware of hardware/software interface have been needed for a while, rather than some kind of hardware physics brick wall as the pundits often claim.

zong0dD26y ago

Quoting: 'The memory bottleneck'

(Raises the head from her electric soldering iron:) Maybe "in a world without any 'North- or Southbridge'-bottleneck" ?

Or do you actually have figured out to attach a keyboard directly to your graphics-card (still powered by a PSU), cos... your graphics-board has (plain and simply) CPU-Power, RAM and for sure 'ports' to attach something... ?! (-;

baybal26y ago

RantyDave6y ago

Oh no, the packaging connections are (guessing) thousands of nanometres. Much easier to build, much cheaper. Also, the process of moving to multi-chip packages means the process yield for the chiplets is much higher. It's a bloody good idea :)

thedance6y ago

Intel laptop parts had 128MB of eDRAM starting in 2013. Is that mainstream enough for you?

LargoLasskhyfv6y ago

Had. And then floated away down the stream, never to be seen again, at least not with 128MB.

baybal26y ago

On what frequency did that eDRAM worked?

If the effective freq was 100mhz, you being hit by cache misses will indeed be "only" 2-3 times less bad than as if you used memory round trip

Dylan168076y ago

If they had ever made it available at the mid-high end or released it outside of a couple weird SKUs before they quietly buried it, then it would count.

RantyDave6y ago· 8 in thread

Twelve point two billion transistors. That's absolutely nuts. Does anyone have a ballpark figure for how much a 'drawer' of four of these things costs? What's it supposed to run, is this an Oracle/DB2 beast?

tibbetts6y ago

There is a version of DB2 for mainframe, but it’s a totally different codebase as I understand it. The operating system on a mainframe provides a lot of what modern app developers get from their database and caching systems, generally with better fault tolerance and availability. So if you have an app built for mainframe, it often will not have an external database dependency. Doing transactions can just look like writing to memory or files.

rodgerd6y ago

When I was running Z for Linux the costing was in the order of six figures per processor.

unixhero6y ago

Highly dense Docker hypervisor hypervisor hypervisor.

RantyDave6y ago

Nobody's even mentioned Crysis.

DaiPlusPlus6y ago

> is this an Oracle/DB2 beast?

My impression is that RDBMS workloads are generally far more IO-intensive than CPU-intensive.

Looking at my own Azure SQL CPU vs IO vs Log charts right now, even with some CPU-heavy OLAP queries barely passes 25% CPU in the 2 vCore-sized database used in my current main project.

tsimionescu6y ago

Wouldn't a DB benefit enormously from faster memory access? Perhaps you're only getting 25% CPU usage because the CPU is stalled waiting for main memory to reply...

imtringued6y ago

An EPYC processor with 64 cores contains 40 billion transistors spread over 8+1 dies.

WC3w6pXxgGd6y ago

The cost will decrease over time, just like all tech.

magicalhippo6y ago· 7 in thread

Impressive tech. How big is the market for these machines these days? Like how many Z15 CPs would they expect to sell (assuming each Z15 install can vary a lot in size).

pm906y ago

This is likely catering specifically to IBM's customers who have been them from the mainframe days and continue to rely on IBM products (airline, banking etc.). The systems used by these orgs are massive in complexity and I'm not sure how much they want to invest in refactoring them to run on COTS hardware.... it probably doesn't make sense for them financially.

nabla96y ago

There is market for reliability, security and scale in compact size, so they can get new customers.

Companies like Robinhood may discover that it's actually cheaper to by reliable and secure hardware and write software into it than try to write fault tolerant and secure software over COTS hardware.

1 more reply

dfox6y ago

IBM started to market what is essentially z with only IFL CPs as kind of k8s in a box so they obviously try to expand into lower tier markets.

microtherion6y ago

I suppose there is a world market for maybe five of those…

hinkley6y ago

I understood that reference.

1 more reply

okareaman6y ago

No one will ever need one of these in their home

ksec6y ago

Same question, and I wonder why much do they cost too.

insulanianOP6y ago· 5 in thread

Are there any modern initiatives around mainframe? What is IBM doing to promote it more to new generations?

Are there any startups doing anything related to the mainframe?

I was always fascinated by the tech around mainframe and am even thinking about moving into that space. I can imagine that a barrier to entry is high... or?

yingw7876y ago

Mainframes are interesting to me (but I think a lot of things are interesting) :)

I found this one site with links to books on mainframes: http://www.mainframes.com/Books.html

If IBM is investing in 14nm tech for mainframes, it's not dead tech. A quick search revealed the following:

""" 70% of the world's production data, and 55% of world's enterprise transactions, took place on mainframes (2016) """

http://ibmmainframes.com/wiki/who-uses-mainframes.html

tal8d6y ago

For an individual, yes - the barrier to entry is high. But I don't really see any way to change that, due to the very nature of mainframes: systems of complex and highly tuned special purpose submodules. This is the first thing you'll notice when digging through the literature, lots of brand new non-standardized acronyms for purpose-built subsystems. While a lot of it certainly has the taste of needless market segmentation, there is a lot of unique stuff that is genuinely scarce. Yes, you can get a second hand mainframe at a bargain price - but your really don't want to unless it is part of a much larger tax advantaged living museum project. There are emulators though, Hercules and zPDT.

My interests led me more to the Power architecture, which is weird enough to hold my interest - while still being practical at the individual scale. For example, the z15 comes with the NXU compression accelerator - the p9 has a similar (same?) NX coprocessor. You'll also find market segmentation here, and IBM would do well to knock it off with the weird PowerVM/NV/Opal/ePAR/LPAR stuff. Their performance monitoring and scheduling stuff is really awesome, and it is unfortunate that they use it to segment product offerings. It isn't as ugly as Intel's ECC games, but it still isn't a good look.

lboc6y ago

There's the Open Mainframe project, and Zowe I guess:

https://github.com/zowe

If you're interested, there's also the Master the Mainframe initiative. Mainly a competition for those in school, but the 'learning system' offers year-long free access to those of us to whom education is a distant memory...

https://www.ibm.com/it-infrastructure/z/education/master-the...

kanzenryu26y ago

The barrier is not always high https://www.youtube.com/watch?v=45X4VP8CGtk

insulanianOP6y ago

Are you trying to say that it is wide and heavy as well :)

Joke aside, how are people actually getting started in this space? And more importantly, is it "worth it" financially? Is the mainframe skill shortage ("dreaded COBOL"?) a real thing?

1 more reply

smartstakestime6y ago

This is my type of tech. Not glamorous but highly functional.

j / k navigate · click thread line to collapse

77 comments

48 comments · 5 top-level

baybal26y ago· 23 in thread

I wonder if it will ever see a chance to go mainstream.

So we need either very big and very fast caches, or extremely wide and low latency memory. Both options are quite costly.

Adding on die DRAM that can work at least as fast as 500mhz will surely require some specialty process with a lot of compromises like the one in the article.

Gluing something like HBM2 to the die for a second option moves the cost from the specialty process to the specialty packaging. Not much better.

dfox6y ago

hinkley6y ago

So how many levels of cache do we have now between the ALU and the memory cell of record??

1 more reply

ChuckMcM6y ago

And when there was a clear need for the bandwidth, as in GPUs, circuit designers stepped up and created some amazing wide and deep memory bus architectures.

[1] Yes, lookaside buffers and page tables in the Opterons took some of that advantage back, but it was still substantial.

jiggawatts6y ago

> Gluing something like HBM2 to the die for a second

This is something AMD is already doing for EPYC, and they've already used HBM2 in their GPUs.

So I'm surprised they haven't released any CPU models with crazy huge L4 caches using a few GB of HBM2.

baybal26y ago

Its all about how fast the memory is.

960MiB is prodigiously large for such a microscopic chip, but if it "only" gain 3-5 times latency reduction over external DRAM, it's still very far from a proper L3 implementation, and far behind L2.

Make DRAM work on 1Ghz+, and then you will see miracles. Imagine a fully synchronous on-die DRAM that can sit just behind L1, or even be connected to load registers directly.

2 more replies

hinkley6y ago

Maybe there’s an inflection point where imperative management of the cache is more effective than heuristic management.

2 more replies

vvanders6y ago

Read access patterns matter more than cache sizes, triple digit improvements are possible of you have linear reads.

rbanffy6y ago

Late Xeon Phis had 16GB of HBM around the die. It could either complement or mirror main memory.

toohotatopic6y ago

Hasn't Intel bought the company that was on the brink of producing those CPU-memory combos? Unfortunately, I haven't been able to find the name of the company or an article about it.

1 more reply

hinkley6y ago

I also wonder if chiplets will be the vehicle by which this comes to pass.

the84726y ago

deepnotderp6y ago

Commodity DRAM latency is mostly array line dominated, not due to proximity/distance, see eg https://ieeexplore.ieee.org/document/6522354

Also eDRAM is difficult to scale

moring6y ago

imtringued6y ago

This is called PIM (processing in memory) and UPMEM [0] has a working implementation.

[0] https://www.anandtech.com/show/14750/hot-chips-31-analysis-i...

baybal26y ago

Latest DDR standard, and supposedly HBM3 in making support range requests

1 more reply

kev0096y ago

The same process tech is used on POWER9, and a lot of the physical design is shared between z and p, so this eDRAM is pretty mainstream in POWER.

zong0dD26y ago

Quoting: 'The memory bottleneck'

(Raises the head from her electric soldering iron:) Maybe "in a world without any 'North- or Southbridge'-bottleneck" ?

baybal26y ago

RantyDave6y ago

thedance6y ago

Intel laptop parts had 128MB of eDRAM starting in 2013. Is that mainstream enough for you?

LargoLasskhyfv6y ago

Had. And then floated away down the stream, never to be seen again, at least not with 128MB.

baybal26y ago

On what frequency did that eDRAM worked?

If the effective freq was 100mhz, you being hit by cache misses will indeed be "only" 2-3 times less bad than as if you used memory round trip

Dylan168076y ago

If they had ever made it available at the mid-high end or released it outside of a couple weird SKUs before they quietly buried it, then it would count.

RantyDave6y ago· 8 in thread

tibbetts6y ago

rodgerd6y ago

When I was running Z for Linux the costing was in the order of six figures per processor.

unixhero6y ago

Highly dense Docker hypervisor hypervisor hypervisor.

RantyDave6y ago

Nobody's even mentioned Crysis.

DaiPlusPlus6y ago

> is this an Oracle/DB2 beast?

My impression is that RDBMS workloads are generally far more IO-intensive than CPU-intensive.

Looking at my own Azure SQL CPU vs IO vs Log charts right now, even with some CPU-heavy OLAP queries barely passes 25% CPU in the 2 vCore-sized database used in my current main project.

tsimionescu6y ago

Wouldn't a DB benefit enormously from faster memory access? Perhaps you're only getting 25% CPU usage because the CPU is stalled waiting for main memory to reply...

imtringued6y ago

An EPYC processor with 64 cores contains 40 billion transistors spread over 8+1 dies.

WC3w6pXxgGd6y ago

The cost will decrease over time, just like all tech.

magicalhippo6y ago· 7 in thread

Impressive tech. How big is the market for these machines these days? Like how many Z15 CPs would they expect to sell (assuming each Z15 install can vary a lot in size).

pm906y ago

nabla96y ago

There is market for reliability, security and scale in compact size, so they can get new customers.

Companies like Robinhood may discover that it's actually cheaper to by reliable and secure hardware and write software into it than try to write fault tolerant and secure software over COTS hardware.

1 more reply

dfox6y ago

IBM started to market what is essentially z with only IFL CPs as kind of k8s in a box so they obviously try to expand into lower tier markets.

microtherion6y ago

I suppose there is a world market for maybe five of those…

hinkley6y ago

I understood that reference.

1 more reply

okareaman6y ago

No one will ever need one of these in their home

ksec6y ago

Same question, and I wonder why much do they cost too.

insulanianOP6y ago· 5 in thread

Are there any modern initiatives around mainframe? What is IBM doing to promote it more to new generations?

Are there any startups doing anything related to the mainframe?

I was always fascinated by the tech around mainframe and am even thinking about moving into that space. I can imagine that a barrier to entry is high... or?

yingw7876y ago

Mainframes are interesting to me (but I think a lot of things are interesting) :)

I found this one site with links to books on mainframes: http://www.mainframes.com/Books.html

If IBM is investing in 14nm tech for mainframes, it's not dead tech. A quick search revealed the following:

""" 70% of the world's production data, and 55% of world's enterprise transactions, took place on mainframes (2016) """

http://ibmmainframes.com/wiki/who-uses-mainframes.html

tal8d6y ago

lboc6y ago

There's the Open Mainframe project, and Zowe I guess:

https://github.com/zowe

https://www.ibm.com/it-infrastructure/z/education/master-the...

kanzenryu26y ago

The barrier is not always high https://www.youtube.com/watch?v=45X4VP8CGtk

insulanianOP6y ago

Are you trying to say that it is wide and heavy as well :)

Joke aside, how are people actually getting started in this space? And more importantly, is it "worth it" financially? Is the mainframe skill shortage ("dreaded COBOL"?) a real thing?

1 more reply

smartstakestime6y ago

This is my type of tech. Not glamorous but highly functional.

j / k navigate · click thread line to collapse