Why do you think that is? Are there possibly other projects out there that I'm not familiar with?
- https://github.com/DataManagementLab/ScaleStore - "A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA"
- https://github.com/unum-cloud/udisk (https://github.com/unum-cloud/ustore) - "The fastest ACID-transactional persisted Key-Value store designed for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel."
- https://github.com/capsuleman/ssd-nvme-database - "Columnar database on SSD NVMe"
Also https://www.snia.org/sites/default/files/ESF/Key-Value-Stora...
Samsung's uNVMe evaluation guide (from 2019) device support section just states:
Guide Version: uNVMe2.0 SDK Evaluation Guide ver 1.2
Supported Product(s): NVMe SSD (Block/KV)
Interface(s): NVMe 1.2
https://github.com/OpenMPDK/uNVMe/blob/master/doc/uNVMe2.0_S...I can't find detailed spec sheets detailing which NVMe command sets are supported even for their enterprise drives.
Good overview: https://www.mydistributed.systems/2020/07/towards-building-h...
[1] These slides claim up to 32 bytes, which would be a practically useful length: https://www.snia.org/sites/default/files/ESF/Key-Value-Stora... but the current revision of the standard only permits two 64-bit words as the key ("The maximum KV key size is 16 bytes"): https://nvmexpress.org/wp-content/uploads/NVM-Express-Key-Va...
16 bytes is long enough that collisions will be super rare, and while you obviously need to write code to support that case, it should have no performance impact.
If so, that is probably the reason for a 16 byte key - there is just no way anybody needs a key bigger than 16 bytes for an address anytime soon.
The Azure Lv3/Lsv3/Lav3/Lasv3 series all provide this capability, for example.
Ref: https://learn.microsoft.com/en-us/azure/virtual-machines/las...
You might also be interested in xNVMe and the RocksDB/Ceph KV drivers:
https://github.com/OpenMPDK/xNVMe
Though I'm not super knowledgeable about it. I think Redfish/Swordfish are maybe meant for this sort of thing:
https://www.snia.org/forums/smi/swordfish
There's a video on NVMe and NVMe-oF management for instance:
https://www.youtube.com/watch?v=56VoD_1iGIs&list=PLH_ag5Km-Y...
> NVMe SSDs based on flash are cheap and offer high throughput. Combining several of these devices into a single server enables 10 million I/O operations per second or more. Our experiments show that existing out-of-memory database systems and storage engines achieve only a fraction of this performance. In this work, we demonstrate that it is possible to close the performance gap between hardware and software through an I/O optimized storage engine design. In a heavy out-of-memory setting, where the dataset is 10 times larger than main memory, our system can achieve more than 1 million TPC-C transactions per second.
https://github.com/aerospike/aerospike-server/blob/master/cf...
There are other occurrences in the codebase, but that is the most prominent one.
I’m also curious if different and more performant data structures can leveraged; if so, there may be downstream improvements for garbage collection, retrieval, and request parallelism.
The exact semantics vary per protocol but it’s a feature of most protocols at least in the currently used revisions: https://en.wikipedia.org/wiki/Native_Command_Queuing
But that's about it. And the latency is still worse than in-memory solutions.
Between that and the non-trivial effort needed to make this work in any sort of cloud setup (be it self-hosted k8s or AWS), it's a hard sell. If I really need latency above all, AWS gives me instances with 24TB RAM, and if I don't… why not just use existing kv-stores and accept the couple of ns extra latency?
Given however, that most of the world has shifted to VMs, I don't think KV storage is accessible for that reason alone because the disks are often split out to multiple users. So the overall demand for this would be low.
Some u.2 drives even support thin provisioning, like how a hypervisor treats a sparse disk file but for physical hardware.
One thing they don't tell you about NVMe is you'll end up bottlenecked on CPU and memory bandwidth if you do it right. The problem is after eliminating all of the speed bumps in your IO pathway, you have a vertical performance mountain face to climb. People are just starting to run into these problems, so it's hard to say what the future holds. It's all very exciting.
I like how you reference the performance benefits of NVMe direct addressing, but then immediately lament that you can't access these benefits across a SEVEN LAYER STACK OF ABSTRACTIONS.
You can either lament the dearth of userland direct-addressable performant software, OR lament the dearth of convenient network APIs that thrash your cache lines and dramatically increase your access latency.
You don't get to do both simultaneously.
Embedded is a feature for performance-aware software, not a bug.
Utilizing: https://memcached.org/blog/nvm-caching/,https://github.com/m...
TLDR; Grafana Cloud needed tons of Caching, and it was expensive. So they used extstore in memcache to hold most of it on NVMe disks. This massively reduced their costs.
> High-performance storage engines. There are a number of storage engines and key-value stores optimized for flash. RocksDB [36] is based on an LSM-Tree that is optimized for low write amplification (at the cost of higher read amplification). RocksDB was designed for flash storage, but at the time of SATA SSDs, and therefore cannot saturate large NVMe arrays.
From this slightly tangent mention, I am guessing not.
https://web.archive.org/web/20230624195551/https://www.vldb....
I mean, using a merkle tree or something like that to make sense of the underlying data.
(yes it's fashionable, but it's still terrible for random read performance)
I mean, what's the trick NVMe can do to be meaningfully faster?
Who could afford to develop and maintain such a niche thing, in today’s economy, without either a universal basic income or a “non-free” license to guarantee revenue?
Otherwise though…you have the file system. Is that not enough?
https://github.com/rails/solid_cache didn't include anything about NVME that I could find.
So Solid Cache and Solid Queue just use the database (MySQL), which uses NVMe.
So now, in addition to: "You don't need a queue, just use Postgres/MySQL", we have "You don't need a cache, just use Postgres/MySQL"
Even more complex when you want to have any kind of redundancy, as you'd essentially need to build-in some kind of RAID-like into your database.
Also few terabytes in RAID10 NVMes + PostgreSQL and something covers about 99% of companies needs for speed.
So you're left with 1% needing that kind of speeds