Price / GB of DRAM hasn't actually fallen much in the 10 years of progression.[1] LPDDR is still over $3/GB. UDIMM is still ~$3 /GB, which is about the same in 2010 / 2011. i.e Despite what you may heard about DRAM price collapse in 2019, the price floor of DRAM has been pretty much the same over the past 10 years.
Every other silicon has gotten cheaper, NAND, ICs, just not DRAM. And yet our need for DRAM is forever increasing. From In-Memory Datastore on Servers to Mobile Phones with Camera shooting rapid 4K images.
Compared to NAND, or Foundry like TSMC, there are clear roadmaps where cost is heading, and what cost reduction we could expect in the next 5 years, along with other outlook. There is nothing of sort in DRAM. At least I dont see anything to suggest we could see $2/GB DRAM, if not even lower. I dont see how EUV is going help either, there won't even be enough EUV TwinScan machines going around for Foundries in the next 3 years, let alone NAND and DRAM.
The only good news is the low / normal capacity ECC DRAM has finally fallen to ~$5/GB. ( They used to be $10-20/ GB ).
[1] https://secureservercdn.net/166.62.107.55/ff6.d53.myftpuploa...
It has been 10 years since the last company not known to be a party to DRAM cartels has left the market.
Instead, server and accelerator vendors want ever faster and higher performance DRAM, and so these performance gains trickle down to consumers, but nothing is driving price down.
People on here literally think Moore's is a natural law and if computer hardware isn't getting 15% cheaper every year there must be funny business involved...
Since the barrier to entry is very high, there's not so much pressure to compete further on price.
But as a consequence as I understand, it will allow for the correction of single bit memory flips. With regular DDR4 or previous generations, you don't get any error correction. Any bit error in your DDR4 modules has the potential to corrupt data. If you want to be protected from that, you will need to get ECC memory.
Unfortunately, anything with "ECC" in hardware for unfortunate reasons gets labeled with an "enterprise" sticker. And that means a certain price level, and a certain power consumption. (Yes I know you can get Ryzen boxes that work with ECC, but that's still PC sized hardware for hundreds of dollars).
If DDR5 can bring error correction to the masses - like in single board computers, 10W NAS boxes, smartphones - that would be pretty cool. But I'm not sure whether my reading of that is correct.
[1]: https://www.anandtech.com/comments/15912/ddr5-specification-...
I also expect servers to use two levels of ECC to provide chipkill and also to keep server RAM more expensive than consumer.
I once heard a rant from someone on how not reporting this to the OS is really bad for diagnosing issues, even soft errors that are auto-healed. (It could have been from Bryan Cantrill, but couldn't say for sure.)
How often have you actually heard of data corruption due to non-ECC memory? Either yourself, any degree of 'friend of a friend', or perhaps a study that looked into the matter with more success than I had. I don't mean a newspaper story because exceptional cases are reported because they're rare exceptions rather than common enough that we'd be likely to come across it in our lifetimes.
e.g. "2009 Google's paper "DRAM Errors in the Wild: A Large-Scale Field Study" says that there can be up to 25000-75000 one-bit FIT per Mbit (failures in time per billion hours), which is equal to 1 - 5 bit errors per hour for 8GB of RAM after my calculations. Paper says the same: "mean correctable error rates of 2000–6000 per GB per year". "
That is way off from what I'm seeing. When launching Factorio I use 90% of my 8GB RAM and never once have I noticed data corruption, and I could tell you how many hours I've played but that would be embarrassing.
The test I did in school with heated-up RAM (the internet said that's when flips should occur more often) also wrote many many gigabytes without a single failure.
Not sure what hardware or temperatures that source is running but it's not DDR3/DDR4 at heats below hairdryer melting temperature because that's where I had to stop the experiment with zero failures.
Not sure why memory bit flips have the reputation of being such an edge case. It could be that it was an edge case 20 years ago, but it clearly isn't anymore. Computer memory has changed too.
I have a ZFS/NFS server on a 2012 i7 with 4gb RAM. I use it primarily to store various torrents (up to ~250GB each).
I have had my torrent client find single-chunk errors in a couple of torrents I was seeding (twice over a few TB worth of torrents). I recall reading ZFS filesystem over NFS is particularly prone to this. I did some worried searching and remember finding it was likely caused by memory errors being persisted to disk, but I don't have any links handy anymore.
I likely would not have noticed the corruption if the torrent client hadn't alerted me.
I think the title was dns squatting but can't find it at the moment
Maybe this won't allow for the detection of multiple flips, and maybe won't even report single bit flips to the OS (it'll just fix them silently). I suppose there's no big need to support detection and reporting for the vast majority of use cases. Ryan Smith at Anandtech in the link above says as much: "Between the number of bits per chip getting quite high, and newer nodes getting successively harder to develop, the odds of a single-bit error is getting uncomfortably high. So on-die ECC is meant to counter that, by transparently dealing with single-bit errors."
But for my purposes, if just the correction capabilities are on par with DDR4 ECC I'd be absolutely fine with that. And I guess that goes for many people. Even while using ECC memory now at home, I'm not monitoring the correction statistics and I'm guessing few people do in general. It might as well be silent today if you ask me.
ECC should be end-to-end, so it detects and (hopefully) corrects errors anywhere along the path, not just within a chip.
Step 1 of handling a lot of ECC correction events is to reseat the DIMM, because often it's just an issue with the connection, not actually a memory defect.
And you may not care too much about reports of correction events, but you definitely want to see correction failures reported - the point is, after all, to avoid corruption.
The only thing that comes to mind in that category for me is the PC Engines APU2. The APU2 is a very neat piece of kit, don't get me wrong, but it being the only option is not great either.
The D-1529 can be TDP limited to 20W (at a whopping 1.3GHz), but if you goal is idle power consumption under 20w you are better off with one of the 35w TDP D-1602 since it is only a dual core and has the lowest standby consumption. The D-1518 is also a popular choice, and basically the same as the D-1529 but with a 35W tdp limit so it can turbo to 2.2GHz.
You can get a barebones motherboard (even mini-ITX is available for making small machines) for about $500 or a complete computer for a bit under $1000 ex https://www.newegg.com/supermicro-sys-e300-8d-intel-xeon-pro...
note - the xeon-d series is not socketed, so the motherboard price includes the CPU. Make sure to get board that takes full size DIMMS since there is not a lot of ECC laptop memory on the market.
https://mitxpc.com/products/ibox-v1000
I bought it from that page. I wanted fanless silence and I wanted ECC and it took me a long time to find this machine but found it I did and it works. Linux reports seeing the ECC is indeed enabled (edac-utils?). Measured with a Kill-a-Watt meter it does consume more than 20W (maybe 30ish IIRC?). I actually disabled some Linux power management though to get bluetooth mice and keyboards to not lag after being idle for 5 seconds so my system might draw more power than others. Drives a 4K display buttery smooth, can watch all the YouTube and Netflix you want on Linux.
https://www.nextplatform.com/2020/09/03/the-memory-area-netw...
"the shift from dedicated DDR4 memory controllers to Serdes-based, high speed differential signaling mixed with buffer chips on memory modules that can be taught to speak DDR4, DDR5, GDDR6, 3D XPoint, or whatever, is an important shift in system design and one that we think, ultimately, the entire industry will get behind eventually."
There's speculation that AMD is going to do the same - in Zen 2 and later designs the CPU chiplets are coupled with different IO dies depending on the design (Ryzen, Threadripper, Epyc), and swapping out the IO die for one that has support for new/different memory types would less work than taping out a whole new monolithic CPU.
How does that differ from Intel's approach?
[1] can be overcome via caching and other considerations, but purely from this aspect the impact is this. [2] Longer traces lead to higher capacitance, and the power estimation formula P=C(V^2)f*a shows that this one aspect will change power use. Everything on one die means less parasitic capacitance. [3] If the defect density is the same, and if you have 10 errors per wafer, then you will see different yields if you make 10 vs 100 vs 1000 chips on that wafer. Chiplets are smaller than monolithic designs, so we can put more of them on one wafer which improves yield independent of process
The AMD compute dies are only connected to the IO die and other compute dies (and power). All IO connections exclusively go through the IO die, so the IO die can be customized to change the IO of the CPU without changing anything about the compute dies. It would be entirely feasible to just re-spin the IO die to add support for different memory, Thunderbolt or other IO ports. The IO die is also made on a cheaper, lower-density and performance process (14 nm / 16 nm) than the compute dies (7 nm).
Moving stuff further away means waiting multiple cycles for a reply.
I wonder if DDR5 will be fast enough to compensate for the slower latency (or maybe they improved latency this time?)
To be honest I didn't felt yet any need to move off my current machine, I would only upgrade its GPU, but I can't do that because I can't afford a new GPU AND a new Monitor (I use a CRT monitor with VGA cable... it is a very good monitor so no reason to replace it, but newer GPUs don't support it).
AFAIK ddr4 having higher latency than ddr3 is a myth. It has a higher cl number, but that's measured in cycles, so the higher cl number of ddr4 is compensated by its higher clocks. The actual latency (measured in nanoseconds) is about the same, or slightly lower in ddr4 than ddr3.
https://www.anandtech.com/show/16143/insights-into-ddr5-subt...
Core-to-RAM latency is in the neighborhood of 50 ns (well-tuned Intel system with low-latency memory) to ~80 ns (bottom-of-the-barrel system). At propagation speed, that's about 10 meters. A big chunk of this latency is internal to the CPU (so is not influenced by distance to the memory at all), another big chunk is the inherent slowness of accessing a DRAM array (10+ ns, independent of the location of the memory).
It's worth pointing out how little this has changed over the past decades. A 2006 AMD CPU is 100 % competitive in regards to memory latency with Intel's 2020 flagship desktop CPU.
[1] https://www.eurogamer.net/articles/digitalfoundry-2019-moder...
Switching between games is a non-trivial (yeah yeah mock me, 1st world problems) thing on the ps4 in that the active game must be quit first, then the new loaded. You may not be on a suitable spot to quit either, far from save point. On ps5 (or xsx) it's supposedly a very quick alt-tab kind of thing.
So far, I'm kind of happy in secret that the rest of the family prefers the Nintendo :)
Also factor in that it usually takes a few years until the full potential of a platform it tapped, so the PS4 in it's prime days for games.
The after purchase testing was even better than expected. i7 920 compiled a specific compilation unit in 14 seconds. 3900x in 3.5 seconds. And that was before any tuning, I had done some bios tuning for i7 920, while 3900x I just limited the power to make it quieter. The IPC is more than double, the larger caches are probably the thing that pushes IPC beyond expectations. (I got more cores to improve scaling of multithreaded code.) Both times were recompilations where there was enough ram to have all the files cached and the the folders used where on ssd:s with modern CPU used nvme and older one SATA. Even if the nvme matters that upgrade wouldn't of been possible without getting modern MB. The make system was make, and what I used was LLVM tutorial code in single file that included many LLVM headers. The software wasn't upgraded between those runs, just moved disks from old system to new system and copied the data.
I did look before the purchase if there was IPC improvements that would of made more sense to buy new than buy some Westmere 6 core to upgrade my system. Results made it clear to me that getting any cheap new CPU would of been preferable over wasting time with getting the westmere. The IPC improvements for compilation were way higher than average IPC improvements.
But even sandy bridge was weak enough in the benchmarks that it would of made sense to upgrade from that. Before purchase I just looked from phoronix benchmarks/open benchmarking database a compilation benchmark that had worst CPU scaling for more cores and used that as approximation for single threaded compilation.
My own results were much larger than what I assumed it would of been based on comparing the sandy bridge to modern CPU:s and then multiplying that with clock speed advantage and IPC advantage of sandy bridge vs i7 920.
Oh, when I got i7 920 I decided not to upgrade until I got 8 cores. Then it was AVX-512 happened I knew I must have that so that I could play around optimizing code with it, Intel just could get it's 10nm very soon so that I could get those 8 core AVX-512 parts in reasonable price and power envelope. I just did the math, and realized wasted time because slow CPU would cost me more over next 2 years than upgrading.
My very naive textbook knowledge is that every bit for DRAM uses up a single transistor and a capacitor, whereas a SRAM cell uses up 6 transistors.
How is it then that with all the scaling so far that traditional SRAMs haven't caught up with DRAM capacities? A single DRAM chip is huge compared to the total die size of any micro-processor.
As the sibling comment asks about cheaper DRAMs, I'm trying to understand how using SRAMs haven't caught up yet from a price/GB perspective.
I don't know why you would expect a 6T SRAM cell to ever be smaller than a 1T DRAM cell given that both of them are scaling. Also, DRAM die sizes appear to be 40-80 sq. mm which is smaller than processors. https://www.semiconductor-digest.com/2019/09/13/dram-nand-an...
First, DRAM and SRAM are more than just the transistors, they are the lines going into each of the transistors carrying the signal. They are also all the control circuitry around those transistors. When you write out, you aren't just involving the 6 transistors to store, but rather a whole host of control transistors.
Next up, changes in current on a wire induce current on surrounding lines. This induced current is results in what's known as "cross talk". There are a bunch of methods to combat this, the primary one is to make sure there is enough space between lines to avoid it. This means that while your transistor size may get smaller and smaller, you still have a limit on how close you can place those transistors, otherwise you risk unwanted bit flips. DRAM has a major advantage here simply because it requires fewer lines to control state. That results in a more dense packing of memory.
With those two points in mind, there's simply no way for SRAM to ever have the same price/GB or density as DRAM (without the market screwing with prices).
Leakage.
As FETs get smaller they leak more. CMOS Logic has dealt with this by having "dark silicon" -- yes, you get twice as many transistors as the last generation, but you can't use as many of them at the same time. You have to keep some of them turned off. But turning off SRAM means lost data, so "dark SRAM" is useless -- unlike, say a "dark vector unit" or "dark floating point unit".
DRAMs can optimize the entire process for just one job -- the 1T1C access gate -- to keep leakage at bay. Or if all else fails, just refresh more often, which hurts standby power but isn't a meaningful contributor to active-mode power.
From a systems perspective, lots of work has gone into hiding DRAM's faults highlighting its strong points, so a system where DRAM is replaced with SRAM will be more expensive but not realize most of the possible benefits without major redesigns of the memory systems.
Intel has some xeons with over 70MB of L3 and also released some eDRAM chips to play around with this idea, but notice they used eDRAM to get 128MB of L4 on a consumer chip - SRAM is still very expensive!
That looks like a 20%+ improvement IF you are bottle-necking on DDR4.
There are even 4800mHz DDR4 DIMMs available now, even if they are niche.
EDIT: DDR4 is 3200 not 3800.
The short of it is it's very hard to make predictions here.
There is no telling what the memory controller -> ram ratio will be with ddr5, the memory controller has speed limits, so you aren't going to get free speed just because the dr5 starts at 4800, because the zen memory controller can't run that fast anyway.
Source? I'm not of zen 2 having a "smarter" memory controller that improves performance. AFAIK the only improvements that they did implement (that could be construed as relating to the memory subsystem) was larger caches and a better branch predictor.
eg as simple as CPU utilization, GPU utilization, RAM speed utilization?
It uses CPU performance counters to show things like ITLB_Misses or MEM_Bandwidth. It won't show when you're waiting for GPU/SSD/etc because those aren't visible from CPU performance counters. I'm not aware of a single tool that will do everything, unfortunately.
Also, this isn't a "benchmarking suite"; it's a tool you can use to instrument whatever load you're running, which I'd say is better. It's often used to improve software but could also identify if faster RAM will help.
If it's computation, it's more complicated to discover the bottleneck (your problem may be cache misses, memory bandwidth, architecture that doesn't go well with the algorithm).
Point is, you need to specify the task.
[0]: https://twitter.com/AMDRyzen/status/1312080706739339266?
[1]: https://www.amd.com/en/events/gaming-2020?sf238352749=1&sf23...
Also, going for 2 to 1 seconds is pretty huge if you're doing some operation hundreds or thousands of times a day.
(putting aside the the fact that faster RAM does not generally result in faster load times)
I used Emacs back when people joked “Eight Megs and Constantly Swapping”
On the whole, I find these 20% performance memory upgrade (leading to a fraction of that in real-world performance) more obnoxious than anything, but I'd love ECC!
I do however build all of my computers from pieces from the recycling, so the statistics may be a bit off of the norm...
Without ECC RAM, how would you know that you had a single bit flip? How would you know that you needed ECC RAM?
When you talk to people who run server systems, you'll find there's plenty of bit flips. This expertise is getting harder to find though, as more people run systems in the "cloud" where there's no visibility into the physical error statistics.