The memory bottleneck is pretty much the only thing in CPU design that didn't see a dramatic improvement over the years. Its elimination is the only obvious improvement pathway still left with expectation of double digit performance gains.
So we need either very big and very fast caches, or extremely wide and low latency memory. Both options are quite costly.
Adding on die DRAM that can work at least as fast as 500mhz will surely require some specialty process with a lot of compromises like the one in the article.
Gluing something like HBM2 to the die for a second option moves the cost from the specialty process to the specialty packaging. Not much better.
In end the DRAM array is bunch of analog magic and the interface works by copying the row you want into SRAM buffer on the chip which you then can access however you want. And the slowest operation in all that are the copies between SRAM row buffer and the actual DRAM array. (what I call SRAM buffer is usually called “column sense amplifiers”, but for the highlevel view it in fact is surprisingly wide array of 6T SRAM flipflops and some analog magic)
And when there was a clear need for the bandwidth, as in GPUs, circuit designers stepped up and created some amazing wide and deep memory bus architectures.
I would not be surprised to see AMD partner with IBM to utilize some of their EDRAM tech to keep themselves out ahead of Intel in the data center space. The more ways they can distinguish themselves, the more pressure they put on Intel's design teams.
[1] Yes, lookaside buffers and page tables in the Opterons took some of that advantage back, but it was still substantial.
This is something AMD is already doing for EPYC, and they've already used HBM2 in their GPUs.
So I'm surprised they haven't released any CPU models with crazy huge L4 caches using a few GB of HBM2.
Then again, Intel made a laptop CPU with a huge 128MB cache and their comment was that it didn't make that big of a difference. I believe the performance boost was less than 5% for going from 64MB to 128MB.
960MiB is prodigiously large for such a microscopic chip, but if it "only" gain 3-5 times latency reduction over external DRAM, it's still very far from a proper L3 implementation, and far behind L2.
Make DRAM work on 1Ghz+, and then you will see miracles. Imagine a fully synchronous on-die DRAM that can sit just behind L1, or even be connected to load registers directly.
The problem is that effective frequencies for memory round-trip haven't got up much since nineties. If you work with 100% cache misses, your mem will still be working at effective frequency of around 100 to 200Mhz
I also wonder if chiplets will be the vehicle by which this comes to pass.
Also eDRAM is difficult to scale
[0] https://www.anandtech.com/show/14750/hot-chips-31-analysis-i...
OpenCAPI seems like it will provide almost as fast (latency) as an on die controller with DDR4 attached RAM at much cheaper prices than contemporary designs.. and opens the door to 3rd party innovation.
The real issue is trading latency versus bandwidth.. IBM's scale up designs have had ridiculously large memory bandwidth forever, but latency suffers to some extent with buffers like these SCs in the article. Whether that matters for your workload really depends.. but people need to break out of lock/latches mindset and use safe memory reclamation techniques like RCU and EBR to avoid common critical sections for lifecycle management, and in general minimize synchronization to make full use of current designs even like the now common AMD Rome.
I firmly believe we've been in a time where better programmers aware of hardware/software interface have been needed for a while, rather than some kind of hardware physics brick wall as the pundits often claim.
(Raises the head from her electric soldering iron:) Maybe "in a world without any 'North- or Southbridge'-bottleneck" ?
Or do you actually have figured out to attach a keyboard directly to your graphics-card (still powered by a PSU), cos... your graphics-board has (plain and simply) CPU-Power, RAM and for sure 'ports' to attach something... ?! (-;
If the effective freq was 100mhz, you being hit by cache misses will indeed be "only" 2-3 times less bad than as if you used memory round trip
My impression is that RDBMS workloads are generally far more IO-intensive than CPU-intensive.
Looking at my own Azure SQL CPU vs IO vs Log charts right now, even with some CPU-heavy OLAP queries barely passes 25% CPU in the 2 vCore-sized database used in my current main project.
Companies like Robinhood may discover that it's actually cheaper to by reliable and secure hardware and write software into it than try to write fault tolerant and secure software over COTS hardware.
Are there any startups doing anything related to the mainframe?
I was always fascinated by the tech around mainframe and am even thinking about moving into that space. I can imagine that a barrier to entry is high... or?
I found this one site with links to books on mainframes: http://www.mainframes.com/Books.html
If IBM is investing in 14nm tech for mainframes, it's not dead tech. A quick search revealed the following:
""" 70% of the world's production data, and 55% of world's enterprise transactions, took place on mainframes (2016) """
My interests led me more to the Power architecture, which is weird enough to hold my interest - while still being practical at the individual scale. For example, the z15 comes with the NXU compression accelerator - the p9 has a similar (same?) NX coprocessor. You'll also find market segmentation here, and IBM would do well to knock it off with the weird PowerVM/NV/Opal/ePAR/LPAR stuff. Their performance monitoring and scheduling stuff is really awesome, and it is unfortunate that they use it to segment product offerings. It isn't as ugly as Intel's ECC games, but it still isn't a good look.
If you're interested, there's also the Master the Mainframe initiative. Mainly a competition for those in school, but the 'learning system' offers year-long free access to those of us to whom education is a distant memory...
https://www.ibm.com/it-infrastructure/z/education/master-the...
Joke aside, how are people actually getting started in this space? And more importantly, is it "worth it" financially? Is the mainframe skill shortage ("dreaded COBOL"?) a real thing?