MDBM is pretty much an optimized persistent hash table. LMDB and WiredTiger aim to be full-fledged ACID compliant database storage engines with functionality similar to that of BerkeleyDB or InnoDB.
From my totally biased perspective, MDBM is utter garbage. They use mmap but make absolutely zero effort to use it safely. This was the biggest obstacle to overcome in developing LMDB; I had a few lengthy conversations with the SleepyCat guys about it as well. It's the reason it took 2 years (from 2009 when we first started talking about it, to 2011 first code release) to get LMDB implemented. If you want to call something a "database" you have to do more than just mmap a file and start shoving data into it - you have to exert some kind of control over how and when the mapped data gets persisted to disk. Otherwise, if you just let the OS randomly flush things, you'll wind up with garbage. As Keith Bostic said to me (private email):
"The most significant problem with building an mmap'd back-end is implementing write-ahead-logging (WAL). (You probably know this, but just in case: the way databases usually guarantee consistency is by ensuring that log records describing each change are written to disk before their transaction commits, and before the database page that was changed. In other words, log record X must hit disk before the database page containing the change described by log record X.)
In Berkeley DB WAL is done by maintaining a relationship between the database pages and the log records. If a database page is being written to disk, there's a look-aside into the logging system to make sure the right log records have already been written. In a memory-mapped system, you would do this by locking modified pages into memory (mlock), and flushing them at specific times (msync), otherwise the VM might just push a database page with modifications to disk before its log record is written, and if you crash at that point it's all over but the screaming."
The harsh realities of working with mmap are what dictated LMDB's copy-on-write design - it's the only way to ensure consistency with an mmap without losing performance (due to multiple mlock/msync syscalls). None of these design considerations are evident in MDBM.
LMDB's mmap is read-only by default, because otherwise it's trivial to permanently corrupt a database by overwriting a record, writing past the end, etc. MDBM's mmap is read-write, and the only "protection" you get is a doc that tells you "be Vewwy vewwy careful!" Ridiculously sloppy.
LMDB's design and implementation are proven incorruptible. MDBM (and LevelDB and all its derivatives) are proven to be quite fragile. https://www.usenix.org/conference/osdi14/technical-sessions/...
Leaving reliability aside for a moment, there's also the issue of performance and efficiency. We used to use DBM-style hashes for the indexes in OpenLDAP, up to release 2.1. We abandoned them in favor of B-trees in OpenLDAP 2.2 because extensive benchmarking showed that BDB's B-trees were faster than its hash implementation at very large data sizes. The fundamental problem is that hash data structures are only fast when they are sparsely populated. When the number of data records you need to work with increases to fill the table, you start getting more and more hash collisions that result in lots of linear probes (or whatever other hash recovery strategy you're using). The other problem is that the very sparse/unordered nature of hashes makes them extremely cache unfriendly - you get zero locality-of-reference for groups of related queries. So as your data volumes increase, you get less and less benefit from the amount of RAM you have available. When the data exceeds the size of RAM, the number of disk seeks required for an arbitrary lookup is enormous, and every read is a random access. Using a hash for a large-scale data store is just horrible. (We tested this extensively a decade ago http://www.openldap.org/lists/openldap-devel/200401/msg00077... )
Among other things, I like that LMDB has zero-copy reads and that's something I've taken care to preserve all the way through my layers.
Just wanted to say thanks for the great work. LMDB is a joy to work with.
Didn't bdb's linear hashing scheme extend the size of the hash table enough to keep it at the required loadfactor?
Do you have any idea if a sqlite 4 release is imminent? Will lmdb work with it right out of the gate?
Thanks.
If you want I'll go shove a few GB into an mdbm, drop caches, and time a lookup.
I agree that it's an apples to oranges comparison in any case.
It's an apples-to-oranges comparison only of MDBM wins significantly against LMDB. If they are comparable in timing, or e.g. MDBM is 20% faster, then it would be an apples-to-apples comparison, MDBM having 20% speed advantage, and LMDB having every other possible advantage (memory safety, ACIDity, ordered retrieval, multiple databases, etc.)
LMDB is truly, incredibly, really marvelous. On 64-bit it comes close to being the end-all-be-all local KV-store. If your databases are not more than a few tens of megs each, the same is true for 32-bit processors as well.
Good examples: http://duktape.org/ (it might seem silly but that right column makes people want to try it!), http://redis.io (i bet this page wins many folks http://redis.io/topics/twitter-clone)
just use a std::unordered_map, or better yet a tbb::concurrent_unordered_map or whatever the equivalent is for your language
Practically speaking, Boost.Interprocess includes a shared memory hash table implementation. Boost Multi Index, which is a further generalisation of containers to allow the construction of database-like indexes, is also Interprocess compatible.
http://www.boost.org/doc/libs/1_57_0/doc/html/interprocess/a...
It seems over the last year technology has been growing more rapidly than any other period.
Fun times but so hard to keep track of everything!
While I'm sure someone out there will see this and say "wow, that's exactly what I need!" chances are that if you have these sorts of scale issues you're going to have to figure it out on your own.
I'd rather see a write-up of how they arrived at this particular conclusion than another non-database.
At any given time, you either have a need/problem, or you don't. If you DO, you evaluate the current tech available, and hopefully select something that fits your needs. You build out around said tech, and if your choice was correct, that means it's either solving your problem, or on it's way to.
If something comes along while you're implementing with your chosen solution, that looks similar, but better, it's only noise - because hey, you found a solution.
Just as we don't all re-write all of our code whenever a new language comes along (unless the thing in question was desperately in need of a re-write anyway) even if newer languages are nicer, we needn't switch DBs or frameworks for the same reasons.
1) it's non-stop
2) there seldom sems to be anything truly novel in a broadly meaningful way (i.e. esoteric, if anything)
3) there is rarely an objective improvement on existing options
I no longer feel compelled to replace or adopt though, precisely for those reasons.
I tend to agree with this statement. The entire stack appears to be going through a revolution.
The data layer in particular is seeing very rapid change after being largely (not entirely) static for decades.
mdbm performance is even better on FreeBSD than Linux because FreeBSD supports MAP_NOSYNC, which causes the kernel not to flush dirty pages to disk until the region is unmapped. Perhaps mdbm's release will finally get the Linux kernel team to provide support for that flag.
There's a mmap flag on Linux called MAP_LOCKED but I'm not sure how it behaves with MAP_SHARED, which mdbm uses (the man page isn't clear).
Could there be a comparison between these datastores and the traditional ACID compliant databases when it comes to retrieving actual data in a useful format? E.g. perhaps doing a join or an ordering of some sort? I don't expect databases (e.g. Oracle, MS SQL Server, DB2) to be faster in raw performance, but I do expect them to be faster in terms of total development time and bug fixing since the application developer wouldn't have to do the locking, page pinning/unpinning, etc. manually.
ln -s -f -r /tmp/install/lib64/libmdbm.so.4 /tmp/install/lib64/libmdbm.so ln: invalid option -- 'r' Try `ln --help' for more informatio