HyperDex: Super-charging your NoSQL with a Strongly Consistent Datastore (opens in new tab)

(micrypt.com)

45 pointsziyadb13y ago13 comments

13 comments

12 comments · 7 top-level

nateberkopec13y ago· 2 in thread

NoSQL/SQL trolling aside, doing data lookup in a hyperspace is such a great idea. Being able to slice my data on hyperplanes is such a useful and cool feature.

jandrewrogers13y ago

In fact, database theory was originally conceived as hyper-dimensional metric space in which both data and selection could be mapped to hyper-rectangles in the space. A hyper-plane or a point in the metric space is just a degenerate hyper-rectangle with the same dimensionality as the metric space. It has always been the most elegant way to represent a database.

One of the best written expositions of databases properly expressed as operations over hyper-dimensional spaces was produced by Rudolf Bayer, the guy that invented the B-Tree. In the late 1990s he invented a multidimensional indexing structure based on space-filling curves called a UB-Tree and they wrote at length about how various database operations are implemented using that representation. It is generally informative if you are unfamiliar with this aspect of database theory, not just in the context of UB-Trees about which it was written.

If it is such a great idea then why does no major database implement things this way? Despite several attempts by companies like Oracle, IBM, and Microsoft, no one has described a generalized data structure for databases with hyper-rectangle operands as your primitives. There are dozens of narrow algorithm solutions known, both published and unpublished, but none that you could legitimately use in a commercial database system because they all have limitations that will adversely affect real-world applications.

Hyperdex is a conventional algorithm from the standpoint of indexing hyper-dimensional spaces. They are not doing anything new there that has not been done before. However, the update value chaining element of it is actually pretty neat.

egs13y ago

I need to add two factoids to this insightful discussion:

There is a big difference between space-filling curves and hyperspace hashing. Space-filling curves map a multidimensional space to a single path through that space that is then mapped to nodes. In the process, they do not retain locality. To our knowledge, hyperspace hashing is a direct intellectual descendant of consistent hashing and has not been done before. If you have pointers to work where data is mapped to nodes in a cluster using a multidimensional hash, please send them to us!

And one major reason why multidimensional databases failed to take off is a problem known as "the curse of dimensionality." If you implement a multi-dimensional representation naively, highly-dimensional data (say, an object with 10-20 attributes) will require a large number of nodes to be efficient. HyperDex solves this through something called space partitioning (I think the paper calls it "data partitioning," but we've changed the name to be a bit more descriptive). They're kind of analogous to materialized views, very loosely speaking.

Agreed completely that hyperspace hashing comes to its own when coupled with value-dependent chaining!

1 more reply

karterk13y ago· 1 in thread

I am in the middle of evaluating HyperDex. It's still early stages, but it's very promising when you want to slice and dice your data on multiple dimensions.

Having said that, I had trouble stopping and/or restarting the cluster in a clean way. To make me even consider using it on production, it should also offer me ways to backup data and as well as convince me that future upgrades will be somewhat smooth.

egs13y ago

Not sure when you last looked at it, but we added the ability to cleanly shutdown and restart the cluster two releases ago. We are committed to providing a smooth upgrade path as well; that is not to say we will always be binary-compatible, as the next release involving the disk layer will change the on-disk representation, but we will always provide automatic upgrade scripts. So, if you last tested 0.29b or so, do check out the latest code in the repo.

cmancini13y ago· 1 in thread

I was wondering how long it would take for HyperDex to get some traction on HN. While I haven't had immediate need to use it myself, I have several friends who attest that it is a lovely system. The article doesn't mention it, but it's wicked fast, outperforming Redis in many metrics. Clearly the consistency is the big kicker though. So many systems are saying eventual consistency is "good enough." Clearly that can be fine for a SoLoMo app, but probably not for a medical system.

nicolast13y ago

Lack of consistency guarantees in most systems available in the open is the exact reason why we created Arakoon: our usage scenarios (safekeeping metadata for large-scale storage systems, among others) simply don't allow for 'eventual consistency'. It's also available as free software, check it out at http://arakoon.org.

jperras13y ago· 1 in thread

Neat. From a very cursory glance at the HyperDex paper, it looks like a distributed kd-tree with the addition of a single-dimensional key subspace (to use their terminology). The real clever part, I believe, is the value-dependent chaining for the deterministic propagation of changes/deletion of objects.

Very cool.

egs13y ago

Thanks! Unlike a kd-tree or b-tree variants, HyperDex does not build an auxiliary data structure. It turns out that keeping aux data structures in sync with the data is very difficult if you want to provide strong consistency guarantees. Hyperspace hashing is purely a mapping trick, not a distributed data-structure trick.

Agreed with you fully that value-dependent chains are neat. They allow the system to replicate and relocate data, without any need for background processes. VDCs are the key to HyperDex's strong consistency guarantees.

sbhat713y ago

Previous and relevant discussions:

http://news.ycombinator.com/item?id=3622059

https://groups.google.com/forum/?fromgroups#!topic/redis-db/...

petercooper13y ago

Bypassing the content, I love the idea of having a suggested soundtrack for a blog post! - along with the link to actually hear it :-)

peterwwillis13y ago

https://www.youtube.com/watch?v=kTf-iAWfvS4

I think it's weird when people believe there's a tool that will do their job for them, like a hammer that builds a roof by itself.

I'm sure HyperDex is totally useful for some cases, but it has clear disadvantages when you try to use it for what it wasn't intended (like global HP databases). All of a sudden you find yourself building glue to make it fit with your hybrid architecture. Instead you could take something simple and customize it, and build a huge successful business off of it, like the biggest sites in the world do currently with various tools that weren't engineered to solve simple problems like the number of round trips to look up an object.

j / k navigate · click thread line to collapse

13 comments

12 comments · 7 top-level

nateberkopec13y ago· 2 in thread

NoSQL/SQL trolling aside, doing data lookup in a hyperspace is such a great idea. Being able to slice my data on hyperplanes is such a useful and cool feature.

jandrewrogers13y ago

egs13y ago

I need to add two factoids to this insightful discussion:

Agreed completely that hyperspace hashing comes to its own when coupled with value-dependent chaining!

1 more reply

karterk13y ago· 1 in thread

I am in the middle of evaluating HyperDex. It's still early stages, but it's very promising when you want to slice and dice your data on multiple dimensions.

egs13y ago

cmancini13y ago· 1 in thread

nicolast13y ago

jperras13y ago· 1 in thread

Very cool.

egs13y ago

sbhat713y ago

Previous and relevant discussions:

http://news.ycombinator.com/item?id=3622059

https://groups.google.com/forum/?fromgroups#!topic/redis-db/...

petercooper13y ago

Bypassing the content, I love the idea of having a suggested soundtrack for a blog post! - along with the link to actually hear it :-)

peterwwillis13y ago

https://www.youtube.com/watch?v=kTf-iAWfvS4

I think it's weird when people believe there's a tool that will do their job for them, like a hammer that builds a roof by itself.

j / k navigate · click thread line to collapse