LevelDB: A Fast Persistent Key-Value Store (opens in new tab)

(google-opensource.blogspot.com)

169 pointsskanuj14y ago62 comments

The inventors being Jeff Dean and Sanjay Ghemawat. More here : https://plus.google.com/118227548810368513262/posts/1UEtSkKp1vv

62 comments

54 comments · 20 top-level

gleb14y ago· 7 in thread

The synchronous writes benchmark is interesting. This is normally bound by # seeks your disk can do per second, which is mostly a function of rotational speed. With 7200RPM drive you get 7200/60 = 120 of these a second. So the 100 and 110 numbers for competitors make sense. 2,400 for LevelDB does not.

Is LevelDB batching writes or is there something more interesting going on?

agazso14y ago

If you are writing sequentially, then you can write more than the number of seeks.

And that is exactly what LevelDB is doing: writing a log (sequential), and when the memorychunk is full, it is writing it to disk sorted (this is also sequential).

leif14y ago

flushing the log in an LSM is only kinda sequential, sadly

leif14y ago

Data structures which require a disk seek per random insert are obsolete. LevelDB is using a Log-Structured Merge Tree, one of many write-optimized data structures (but not the best).

DanWaterworth14y ago

This link, comparing LSM trees with fractal trees, is quite interesting: http://www.quora.com/What-are-the-major-differences-between-...

stephth14y ago

Is LevelDB batching writes

Yes, updates can be done in one atomic batch. Please correct me if I'm wrong, but I don't think Tokyo Cabinet allows it without Tokyo Tyrant.

groby_b14y ago

If you write full disk blocks, wouldn't the disk cache hide the seek latency?

gleb14y ago

Having write disk cache on would certainly explain it. But that leaves the question of discrepancy with numbers with competitors.

You turn off write-through caching on disks when you run a database unless you are willing to accept corruption (which is worse than data loss) on power outage. And that's why you can't get acceptable write performance out of database without a battery-backed RAID controller (or something other kind of RAM-based write cache with a battery backup).

Here's a simple way to test # fsyncs/s (a.k.a. commit rate) on your system:

  sysbench --test=fileio --file-fsync-freq=1 --file-num=1 \
   --file-total-size=16384 --file-test-mode=rndwr run --max-time=10 \
   | grep "Requests/sec"

1 more reply

gaborcselle14y ago· 5 in thread

Hi there! I'm a YC alum (reMail W09) and helped Jeff and Sanjay with LevelDB. Let me know if you have any questions about LevelDB and I'll see if I can help.

clumsysmurf14y ago

I would really like to use this from my Android Java application. Is this possible, and what would be the best way to accomplish this?

aashay14y ago

How does this compare to other persistent key-value stores such as Membase?

dlsspy14y ago

Membase is a clustered data storage service your application uses.

LevelDB is a persistence library.

That makes LevelDB the kind of thing you plug into membase to get the unique properties it has to offer (or at least for fun).

1 more reply

gojomo14y ago

Any comparisons of performance or functionality against BerkeleyDB?

sallen14y ago

Does this system have transactions and ACID guarantees?

swah14y ago· 4 in thread

Interesting how, like in the open-sourced protobuf, there are no commits by Jeff or Sanjay...

shadowmatter14y ago

Jeff and Sanjay wrote the original protocol buffer implementation. The project was taken over by Kenton Varda, who rewrote the C++ and Java parts; this is what was open sourced. See http://temporal.fateofio.org/files/resume

swah14y ago

Which is what I point as being interesting.

gaborcselle14y ago

dgrogan and myself have been batching changes to LevelDB from our internal code repository to put them on the Google Code page. Playing Google Code site admin didn't seem to me like a good use of Jeff and Sanjay's time.

swah14y ago

Yep, its great you guys could separate it from internal dependencies! Congrats.

newhouseb14y ago· 4 in thread

How is this different than BDB?

davidhollander14y ago

BDB is a key\value store for unordered data more similar to Tokyo Cabinet hash databases. Tokyo Cabinet hash databases are a much faster option than BDB if you only need unordered data.

LevelDB is for if you need ordered data, and a more appropriate comparison would be against a B+\tree database.

stephth14y ago

LevelDB is for if you need ordered data

LevelDB is slower with random reads, but that doesn't mean you shouldn't use it for unordered data - it's still quite fast.

1 more reply

stephth14y ago

you don't need to pay anyone to use it in your commercial software.

newhouseb14y ago

I suppose if you are shipping proprietary binaries, then yes. But otherwise it's effectively GPL'ed.

1 more reply

gojomo14y ago· 3 in thread

An interesting development a while back that I'm surprised hasn't received more attention was Oracle's release of a SQLite-based interface to BDB:

http://www.oracle.com/technetwork/database/berkeleydb/overvi...

It's essentially drop-in compatible with SQLite, but with added concurrency and speed for most operations. (The concurrency addresses a major issue usually keeping SQLite as a prototyping/single-user-only option in web development.)

With LevelDB as a BSD-licensed alternative to BDB, I wonder:

(1) How would the LevelDB-vs-SQLite benchmarks change against SQLite+BDB backend?

(2) Could a SQLite fork with a LevelDB backend get a performance boost?

gaborcselle14y ago

Thanks for the link! You could theoretically just compile this file against SQLite-based BDB: http://code.google.com/p/leveldb/source/browse/trunk/doc/ben... And get the numbers yourself. (If you do, please post them here.)

est14y ago

> SQLite-based interface to BDB

One thing I didn't get about SQL API for BDB, how does something like

    select * from users where name!='tom'

work ?

gojomo14y ago

You really don't know or care that you're using BDB; it works (to the user) just like SQLite. (Behind the scenes, it's using BDB for the tables/indexes, and so would do various full- or partial- table scans much like SQLite's native on-disk format.)

trungonnews14y ago· 3 in thread

how is this different from membase?

eis14y ago

Uhm shouldn't that be ovious by reading the high level descriptions of each? They are for completely different use cases. Membase is a distributed Key/Value server and LevelDB is a Key/Value library.

thomas1114y ago

I see the difference between a server and a library, but both can often be used for the same use case. Just recently I evaluated a few data stores for a project and I didn't care all that much about the distinction. For the servers you're gonna use an API for your programming language anyway, so the programming model isn't that different.

stonemetal14y ago

About like the difference between mysql and sqlite.

jcapote14y ago· 2 in thread

it would be cool to make a leveldb backed fork of redis

stanleydrew14y ago

Pardon my ignorance but what's backing redis currently?

stephth14y ago

Redis. :) But more importantly, it runs as a server. I think what @jcapote meant is being able to use Redis operations without a server, like Leveldb or sqlite. I would love to see that. There's already some effort towards that direction [1], but using a google backed project instead of building a full library from the ground up could be a saner approach.

[1] https://github.com/seppo0010/redislite

timr14y ago· 2 in thread

http://www.hnsearch.com/search#request/all&q=leveldb

swah14y ago

Your point?

timr14y ago

That leveldb has been discussed several times on HN in the last two months. I just didn't break out the links from the search UI.

Downvoters: links to previous context are generally considered a good thing here.

2 more replies

mumrah14y ago· 2 in thread

Anyone know how LevelDB compares to Voldemort? From a cursory glance, they are identical in their simple API (get, put, delete)

strlen14y ago

Voldemort developer here--

Voldemort to LevelDB is what MySQL is to InnoDB: Voldemort is a distributed system that allows multiple engines to be plugged in. Mostly commonly, companies use either BerkeleyDB or MySQL as a storage engine. LinkedIn, Mendeley, EBay and others also use the read only storage engine, where the data is pre-built in Hadoop and loaded into Voldemort.

I am really excited about LevelDB: while there are higher priority projects on my plate right now, we'd very much like to see a LevelDB storage engine. If anyone is interested in contributing one, they're welcome.

The steps are:

1) Creating JNI bindings to LevelDB (or creating a .so version of LevelDB and creating JNA bindings)

2) Implementing the StorageEngine interface with the bindings, including passing in any configuration.

Here is an example of a third party InnoDB/Haildb storage engine for Voldemort:

https://github.com/sunnygleason/v-storage-haildb

gaborcselle14y ago

My understanding is that Voldemort is a distributed key-value storage system, while LevelDB is a local on-disk key-value storage system.

rektide14y ago· 1 in thread

Open sourced as of March 18th, 2011: http://code.google.com/p/leveldb/source/browse/trunk/LICENSE...

That initial checkin: http://code.google.com/p/leveldb/source/detail?r=2

gaborcselle14y ago

Yes, we put up the Google Code site incognito mode back then, but have since added a number of bugfixes and optimizations, so we're actually comfortable announcing the project now.

skanujOP14y ago· 1 in thread

Note to myself : Search before you post. Apologies, I checked new and front page only!

dchest14y ago

You were right to post it -- it's a new blog post announcing the non-beta release and benchmarks.

dchest14y ago

See also:

Previous discussion - http://news.ycombinator.com/item?id=2526032

Benchmarks vs Kyoto TreeDB and SQLite3 - http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html (discussion - http://news.ycombinator.com/item?id=2813061)

Benchmarks vs InnoDB - http://blog.basho.com/2011/07/01/Leveling-the-Field/

stonemetal14y ago

Sorry if this is a bit off topic but it seems to me like most of Google's opensource projects are more source available than open source. Do they actually take contributions from the community or are they all like android, source made available when its "ready for public consumption"?

LevelDB sounds like something I would like to contribute to but if the reception is going to be chilly I won't bother, maybe pick up mongo or redis instead.

jchrisa14y ago

Bindings for Node.js if anyone is interested: https://github.com/creationix/node-leveldb

stephth14y ago

From the announcement: it has already been ported to a variety of Unix based systems, Mac OS X, Windows, and Android.

It's worth noting that the makefile includes options to build for iOS. I've successfully done it and my next iOS app will include LevelDB. Also worth noting, thanks to the iOS devices SSDs, it's much faster than with the traditional HDD machines.

taylorbuley14y ago

Upcoming versions of the Chrome browser include an implementation of the IndexedDB HTML5 API that is built on top of LevelDB

Really excited about seeing IndexedDB run atop of this

swah14y ago

I love the insight about how fast compression (Snappy) is like having faster hard drives.

newman31414y ago

How well does LevelDB work for a mobile device? This might be a nice use case.

overred14y ago

LSM-Tree is good!

swah14y ago

So that's what Jeff does!

j / k navigate · click thread line to collapse

62 comments

54 comments · 20 top-level

gleb14y ago· 7 in thread

Is LevelDB batching writes or is there something more interesting going on?

agazso14y ago

If you are writing sequentially, then you can write more than the number of seeks.

And that is exactly what LevelDB is doing: writing a log (sequential), and when the memorychunk is full, it is writing it to disk sorted (this is also sequential).

leif14y ago

flushing the log in an LSM is only kinda sequential, sadly

leif14y ago

Data structures which require a disk seek per random insert are obsolete. LevelDB is using a Log-Structured Merge Tree, one of many write-optimized data structures (but not the best).

DanWaterworth14y ago

This link, comparing LSM trees with fractal trees, is quite interesting: http://www.quora.com/What-are-the-major-differences-between-...

stephth14y ago

Is LevelDB batching writes

Yes, updates can be done in one atomic batch. Please correct me if I'm wrong, but I don't think Tokyo Cabinet allows it without Tokyo Tyrant.

groby_b14y ago

If you write full disk blocks, wouldn't the disk cache hide the seek latency?

gleb14y ago

Having write disk cache on would certainly explain it. But that leaves the question of discrepancy with numbers with competitors.

Here's a simple way to test # fsyncs/s (a.k.a. commit rate) on your system:

  sysbench --test=fileio --file-fsync-freq=1 --file-num=1 \
   --file-total-size=16384 --file-test-mode=rndwr run --max-time=10 \
   | grep "Requests/sec"

1 more reply

gaborcselle14y ago· 5 in thread

Hi there! I'm a YC alum (reMail W09) and helped Jeff and Sanjay with LevelDB. Let me know if you have any questions about LevelDB and I'll see if I can help.

clumsysmurf14y ago

I would really like to use this from my Android Java application. Is this possible, and what would be the best way to accomplish this?

aashay14y ago

How does this compare to other persistent key-value stores such as Membase?

dlsspy14y ago

Membase is a clustered data storage service your application uses.

LevelDB is a persistence library.

That makes LevelDB the kind of thing you plug into membase to get the unique properties it has to offer (or at least for fun).

1 more reply

gojomo14y ago

Any comparisons of performance or functionality against BerkeleyDB?

sallen14y ago

Does this system have transactions and ACID guarantees?

swah14y ago· 4 in thread

Interesting how, like in the open-sourced protobuf, there are no commits by Jeff or Sanjay...

shadowmatter14y ago

swah14y ago

Which is what I point as being interesting.

gaborcselle14y ago

swah14y ago

Yep, its great you guys could separate it from internal dependencies! Congrats.

newhouseb14y ago· 4 in thread

How is this different than BDB?

davidhollander14y ago

BDB is a key\value store for unordered data more similar to Tokyo Cabinet hash databases. Tokyo Cabinet hash databases are a much faster option than BDB if you only need unordered data.

LevelDB is for if you need ordered data, and a more appropriate comparison would be against a B+\tree database.

stephth14y ago

LevelDB is for if you need ordered data

LevelDB is slower with random reads, but that doesn't mean you shouldn't use it for unordered data - it's still quite fast.

1 more reply

stephth14y ago

you don't need to pay anyone to use it in your commercial software.

newhouseb14y ago

I suppose if you are shipping proprietary binaries, then yes. But otherwise it's effectively GPL'ed.

1 more reply

gojomo14y ago· 3 in thread

An interesting development a while back that I'm surprised hasn't received more attention was Oracle's release of a SQLite-based interface to BDB:

http://www.oracle.com/technetwork/database/berkeleydb/overvi...

With LevelDB as a BSD-licensed alternative to BDB, I wonder:

(1) How would the LevelDB-vs-SQLite benchmarks change against SQLite+BDB backend?

(2) Could a SQLite fork with a LevelDB backend get a performance boost?

gaborcselle14y ago

est14y ago

> SQLite-based interface to BDB

One thing I didn't get about SQL API for BDB, how does something like

    select * from users where name!='tom'

work ?

gojomo14y ago

trungonnews14y ago· 3 in thread

how is this different from membase?

eis14y ago

Uhm shouldn't that be ovious by reading the high level descriptions of each? They are for completely different use cases. Membase is a distributed Key/Value server and LevelDB is a Key/Value library.

thomas1114y ago

stonemetal14y ago

About like the difference between mysql and sqlite.

jcapote14y ago· 2 in thread

it would be cool to make a leveldb backed fork of redis

stanleydrew14y ago

Pardon my ignorance but what's backing redis currently?

stephth14y ago

[1] https://github.com/seppo0010/redislite

timr14y ago· 2 in thread

http://www.hnsearch.com/search#request/all&q=leveldb

swah14y ago

Your point?

timr14y ago

That leveldb has been discussed several times on HN in the last two months. I just didn't break out the links from the search UI.

Downvoters: links to previous context are generally considered a good thing here.

2 more replies

mumrah14y ago· 2 in thread

Anyone know how LevelDB compares to Voldemort? From a cursory glance, they are identical in their simple API (get, put, delete)

strlen14y ago

Voldemort developer here--

The steps are:

1) Creating JNI bindings to LevelDB (or creating a .so version of LevelDB and creating JNA bindings)

2) Implementing the StorageEngine interface with the bindings, including passing in any configuration.

Here is an example of a third party InnoDB/Haildb storage engine for Voldemort:

https://github.com/sunnygleason/v-storage-haildb

gaborcselle14y ago

My understanding is that Voldemort is a distributed key-value storage system, while LevelDB is a local on-disk key-value storage system.

rektide14y ago· 1 in thread

Open sourced as of March 18th, 2011: http://code.google.com/p/leveldb/source/browse/trunk/LICENSE...

That initial checkin: http://code.google.com/p/leveldb/source/detail?r=2

gaborcselle14y ago

Yes, we put up the Google Code site incognito mode back then, but have since added a number of bugfixes and optimizations, so we're actually comfortable announcing the project now.

skanujOP14y ago· 1 in thread

Note to myself : Search before you post. Apologies, I checked new and front page only!

dchest14y ago

You were right to post it -- it's a new blog post announcing the non-beta release and benchmarks.

dchest14y ago