undefined | Better HN

0 pointsjandrewrogers2y ago0 comments

I think the real argument is more nuanced. Where you see mmap() fail badly on Linux, even for read-only workloads, is under a few specific conditions: very large storage volumes, highly concurrent access, non-trivial access patterns (e.g. high-dimensionality access methods). Most people do not operate data models under these conditions, but if you do then you can achieve large integer factor gains in throughput by not using mmap().

Interestingly, most of the reason for these problems has to do with theoretical limitations of cache replacement algorithms as drivers of I/O scheduling. There are alternative approaches to scheduling I/O that work much better in these cases but mmap() can’t express them, so in those cases bypassing mmap() offers large gains.

0 comments

pclmulqdq2y ago

GP wrote a key-value store called LMDB that is constrained to a single writer, and often used for small databases that fit entirely in memory but need to persist to disk. There's a whole different world for more scalable databases.

hyc_symas2y ago

"fit entirely in memory" is not a requirement. LMDB is not a main-memory database, it is an on-disk database that uses memory mapping.

LAC-Tech2y ago

Can you explain "high-dimensionality access methods" to me? (Or if it's too big for an HN comment, maybe recommend a paper).

heisjustsosmart2y ago

This guy talks a lot of crap. See his website for examples, and don't waste your time with him

<<<There is one significant drawback that should not be understated. Algorithm design using topology manipulation can be enormously challenging to reason about. You are often taking a conceptually simple algorithm, like a nested loop or hash join, and replacing it with a much more efficient algorithm involving the non-trivial manipulation of complex high-dimensionality constraint spaces that effect the same result. Routinely reasoning about complex object relationships in greater than three dimensions, and constructing correct parallel algorithms that exploit them, becomes easier but never easy.>>>

http://www.jandrewrogers.com/2015/10/08/spacecurve/

ilyt2y ago

I'd imagine same kind of worst case access would also be a problem doing IO the "classical" way

j / k navigate · click thread line to collapse