Intel-Micron Share Additional Details of Their 3D NAND (opens in new tab)

(anandtech.com)

39 pointsJusten11y ago5 comments

5 comments

5 comments · 1 top-level

jjoonathan11y ago· 4 in thread

Speaking of the HDD->SDD transition, is anyone familiar enough with SRAM and DRAM design/fab to comment on why we haven't seen a similar transition with memory? 60ns latency on a 4GHz processor = ouch!

I'm aware that SRAM is 3-6x less dense, but it isn't uncommon these days to see people with >3x the DRAM they need, so this doesn't strike me as a terribly convincing justification.

I'm also aware that $/GB is insanely high for on-CPU SRAM, but that would also be the case for on-CPU DRAM, which is why DRAM is typically put on a separate die so that its process can be optimized independently. Does the SRAM process just not optimize as well? Does it have insane power/heat requirements? What goes wrong?

Or (puts on tinfoil hat) is JEDEC full of people who design DRAM memory controllers for a living?

nordsieck11y ago

Products like Intel Iris Pro 5200 do add what you are talking about. If you think about memory like a last level cache, however, it makes a lot of sense, that much like L3 cache on CPUs, most systems optimize for density instead of speed.

Preventing a single virtual memory access (particuarly from a spinning disk) is worth an enormous speed up of the mean access time.

jjoonathan11y ago

No, I'm asking about off-die SRAM as a replacement for off-die DRAM, not on-die DRAM as an alternative to {off die DRAM, cache, cores, etc}. There are a bunch of tradeoffs to be made on-die, and I get the reasoning behind them even if I don't know specific numbers. X86 has to enforce permissions, handle sharing, cross-reference a TLB, etc and you can win significantly by memorizing results subject to statistically unlikely invalidation. There would be separate L1/L2/L3 even if all SRAM cells had identical latency and density. Which they might, I don't know. L4 (eDRAM, what you were talking about) gives you a huge density advantage, but it's still not competitive with SRAM for speed, even though it's on the same process:

http://www.sisoftware.co.uk/?d=qa&f=mem_hsw

    L1:     4 clocks  <-- SRAM
    L2:    12 clocks  <-- SRAM
    L3:    36 clocks  <-- SRAM
    L4:   136 clocks (55ns) <-- eDRAM
    DRAM: 193 clocks (80ns) <-- off-die DRAM
    Clock: 2.5GHz (dynamic overclocking was disabled)
    5cm travel: 1 clock

With SRAM you just have to open the right gate, whereas with DRAM you have to precharge the bitlines, open the word line, wait for the tiny signal to amplify up to logic level, and only then do you get to read it out. Worse, you need tons of logic to re-order memory access to take advantage of multiple accesses on the same word line or that can happen simultaneously in different banks. And you need to refresh each word line periodically, which requires even more logic. There is a reason why the memory controller (not the cache, the controller) is a huge chunk of the die roughly the size of 2 cores!

If we assume that L3 and L4 have similar management overhead then this all takes ~100 clock cycles in the comparison above, which dominates the other costs even if we disregard savings due to simpler logic in off-die SRAM (which, when combined with travel time, accounts for 60 cycles).

I still don't understand why off-die SRAM isn't sensible.

wmf11y ago

Since SRAM is physically larger most of the bits would be farther away and thus the speed of light would introduce some latency (3D can reduce that latency but at even more cost).

But I think the real reason SRAM is not used is because it's trapped in an expensive/low-volume local maximum and there's not enough demand to push it into a cheaper/high-volume state. Caches actually work pretty well.

mmf11y ago

Let me correct you here: it's not the speed of light the issue here it's the parasitic capacitance that grows with the size of the subarray.

From that point of view, when the array is large enough, the particular technology you adopt has only minor influence on the access time.

j / k navigate · click thread line to collapse