but i totally agree, rethinking how data is stored is going to be key to the adoption of these types of new media. there's a company called vast data that has built out a full storage solution utilizing the unique properties of SCM, very cool https://vastdata.com/
However, even their 6-DIMM test produces only 300 Gbps, which is insufficient to saturate a modern 400 GbE network adapter for either reads or writes.
This would be most relevant on a "single master" system storing some sort of simple data where consistency requirements means that the "writer" cannot be distributed. In a situation like this, the NIC and the storage bandwidth are the ultimate limits.
In general, Intel SSDs and Intel Optane have poor but consistent bandwidth, and consistently low latency. Coupled with the high price and small capacity, they have their niche, but they're not a clear winner in any category.
As a reference point for how crazy high bandwidths are these days, NVIDIA sells a turnkey solution with 200 GB/sec network bandwidth (1.6 Tbps!): https://www.nvidia.com/en-au/data-center/dgx-a100/
For a database system like mongodb this could be perfect depending on workload.
I'm still wondering if that interview costed me a level.
It is better to play along and only mention the real solution at the end, to finish on a high note.
> I'm still wondering if that interview costed me a level.
Unlikely. Candidate level is decided before the interview.
By this point we had both realised that this was a battle of wits: could he come up with a problem that i couldn't solve with a pipeline?
At the end of the interview, i had a pipeline that took up most of a piece of A4 paper to write out. I had won the battle, and was offered the job.
Of course, i would not advise you to actually write a pipeline like that in production, but it's a fun exercise.
Anyway, the moral of this story is that if the interviewer wants you to solve a problem a certain way, and you can solve it in a simpler way, then a good interviewer will mark you up, not down. Perhaps at Google they didn't; they don't really seem like a company that has it together.
Why did he have concrete numbers. Couldn't he simply have increased those numbers when you proposed single machine solution.
(RAM is obviously volatile.)
If you are storing data to disk at that speed, you fill even the biggest optane drives in a couple of minutes. So it would be an application where you need to overwrite a huge amount of data over and over again.
It reminds me of the Bugatti Veyron situation, its tires lasting 15 minutes at top speed yet it runs out of fuel in 12.
As for write speed, you don't really need it unless you're doing file/data recovery, but even then you can just make indexes to the disk image.
AFAIK, very few (if none) are using SCM in their forensic expert workstations, so I can't really tell if you'd saturate the storage capacity or the bandwidth. But in digital forensics, NVMe's have been a must for a while, so this would definitely help.
Another use is to have local storage. You want most of the communication to be between your CPU and RAM/SCM and networking only to transmit results. In case of mongodb that would be something like very heavy aggregation that needs to look at a lot of data but transfers relatively small amount of results to the client (think counting number of objects meeting an arbitrary set of specifications). MongoDB actually requires storage for writing intermediate results of some types of operations so it is not purely about read speed.
I'm trying to address this with my versioned Open Source DBMS project, where only page-fragments are stored and bunch of them is read in-parallel to reconstruct a full page in-memory. Adding mini-page ps plus a simple cache at some point with hot data from several page fragments at least is orthogonal:
Most of the _data_ is also not going to fit in 256 bytes anyway.
So, it might well be that someone is only interested in only one or a few records. Why then fetch and cache a whole 4Kb page if latency is good in both cases (4kb and 256 bytes)? On the other hand I agree that you should probably cache more data from a hot page.
http://www.mongodb-is-web-scale.com/
(I can get better results for Key-Value storage by using an SQL Database--Postgres or MariaDB--for key/values over MongoDB. And if you know SQL well, you can get even better results using a real relational database and optimal queries than pulling out keys/values and ad-hocking your relations in some Javascript code or whatever these web kids are doing.)