Z-order curve usage to decrease dimensionality to 1 (opens in new tab)

(ssahinkoc.blogspot.com)

105 pointsdelidumrul8y ago34 comments

34 comments

Here is a writeup on Google's S2 library for considering addressing the surface of the Earth as 1D, using Hilbert Curves.

http://blog.christianperone.com/2015/08/googles-s2-geometry-...

And 2015 HN thread: https://news.ycombinator.com/item?id=10066616

MarkMMullin8y ago

My favorite has always been Serpinski curve - the math behind Philip Jose Farmer's Riverworld :-) Here, you can print one yourself :-) https://www.thingiverse.com/thing:622627

jhj8y ago

Usually a Morton ordering is used for things like improving average memory locality of N-dimensional data (e.g., loop iteration order, data layout, ...). But it is average locality that is being improved, because of huge jumps between many neighbors.

In a small number of dimensions, without knowing what search algorithm is being used, this is just more work than comparing the original values. It doesn't mention what "k-NN algorithm" is being used, beyond brute force search.

Lossily compressing N-dimensional data (from 2 to 1000s of dimensions) into a representation that requires fewer bits can be done via quantization as well, either scalar quantization, vector quantization (aka k-means) or product quantization, if your data has known statistics.

It also matters if you are building a static data structure that is queried many times, versus one that needs continual updating.

mcphage8y ago

As the first commenter on the site pointed out, the Hilbert Curve is probably a better choice (https://en.m.wikipedia.org/wiki/Hilbert_curve)

jandrewrogers8y ago

Hilbert curve is only useful if the sole goal is sequential storage of point data. Z-curves are superior in almost every other case in real systems due to their unique and efficient computational properties, which many computer scientists are only vaguely aware of.

And since modern spatial database architectures don't sequentialize storage along the curve (because it doesn't make sense as a matter of engineering), the sole selling point of Hilbert curves is moot. You shouldn't design most systems in a way that could exploit the benefits of a Hilbert curve.

DocSavage8y ago

Could you elaborate on your comment about modern spatial databases not sequentializing storage along the curve? I would imagine parallel access across the curve, but wouldn't you want some reasonable sequential access at each cluster node to maximize IO speeds?

I've read your blog entries on SpaceCurve (http://www.jandrewrogers.com/2015/10/08/spacecurve/), found them very interesting but also just whetting the appetite. Are there no public reviews or papers covering discrete topology/sharding?

1 more reply

hex126484308y ago

The big advantage of Z-order curves is that the addressing computation is very cheap, which is why it's used a lot in computer graphics.

leni5368y ago

While I agree that Z-order curves are simpler, but it's fast to calculate Hilbert curves on modern CPUs too. Just to self plug:

https://github.com/leni536/fast_hilbert_curve

I only implemented the index->XY calculation yet. It compiles to 36 instructions without any branches and takes up 86 bytes.

https://github.com/leni536/fast_hilbert_curve/wiki/How-effic...

I think I can apply the same tricks for the inverse function too.

1 more reply

oppositelock8y ago

Hilbert curves are used in a lot of graphics too. Heck, the old SGI Octane with Vpro graphics used a recursive Hilbert curve rasterizer. They show up a lot today in geospatial big-data since hilbert addresses make good shard keys.

1 more reply

vanderZwan8y ago

H-curves are probably optimal with regards to locality, but good luck finding an implementation:

http://www.akt.tu-berlin.de/fileadmin/fg34/publications-akt/...

https://www.reddit.com/r/ProgrammerHumor/comments/4xzi9a/oh_...

a_e_k8y ago

It doesn't need to be that complicated at all. The book "Hacker's Delight" gives some nice, fairly simple implementations. Here's one:

http://www.hackersdelight.org/hdcodetxt/hilbert/lams.c.txt

1 more reply

yarg8y ago

Moore curves are also worth considering.

DocSavage8y ago

Space-filling curves have been studied as a way to reduce N-dimensional space to 1-D. They are very useful for applications like maximizing sequential access in a N-D datastore. A recent SIGMOD paper analyzed space-filling curves impact on data access: http://dl.acm.org/authorize.cfm?key=N37709

QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves Shoji Nishimura (NEC Corporation); Haruo Yokota (Tokyo Institute of Technology)

It discusses C-Curve, Z-Curve, and Hilbert curves.

albipenne8y ago

Here's a great write up on how you can use Z-curves to do multidimensional sorting using redis otherwise 1 dimensional sorted set datastructure. It's one of the best hands on examples on a way to solve real problems using this tech within your stack.

https://redis.io/topics/indexes

zocoi8y ago

Can someone help explaining why this hash method could improve distance calculation for k-NN? What does it improve compared with Geohash or k-d tree structure?

imron8y ago

> What does it improve compared with Geohash

From what I can tell, it's the exact same algorithm used by Geohash.

mmalone8y ago

Yep. Geohash is just a fancy name for a z-curve.

mmalone8y ago

1. A geohash is a z-curve.

2. It won't be better than a k-d tree. Dimensionality reduction is usually done when you have really truly huge numbers of dimensions that are sparsely populated and you don't care much about some information loss (e.g., for machine learning) or, in this case, when you have an easy way to create a single dimensional index and you want to force multi-dimensional data into it. In the general case a k-d tree would be objectively better in terms of performance.

AstralStorm8y ago

And in very many or highly sparse dimensions with few or lazy updates, R-tree or derivatives.

minflynn8y ago

I've used Z-order curves and related curves in Neuroevolution research. It allows conversion of an adjacency matrix to a spatial subtrate representation that preserves locality as it grows (Thus going from 1 dimension to 2 or 3). This technique is an alternative for HyperNEAT or ES-HyperNEAT and experiments demonstrate higher performance that ES-HyperNEAT on the modular retina task problem.

mmalone8y ago

If you're tempted to use space filling curves (z-curves, hilbert curves, etc.) you should take a look at simple multi-dimensional data structures as an alternative.

As a couple of people have mentioned in this thread, space filling curves aren't great at preserving locality (i.e., two points that are "close together" in two dimensional space might end up being "far apart" in one dimensional space, and vice versa). A k-d tree is easy to code up and, in general, will be more efficient for queries like k-NN than dimensionality reduction because it's better at preserving locality.

There are also good libraries for multi-dimensional data structures for pretty much any mainstream language.

aeroevan8y ago

AKA a geohash (for the lat/lon example), but can easily be extended to other bounded dimensions.

Joboman5558y ago

This stuff mostly goes over my head, but I'd like to understand it. In the original method, why is op concatenating the two positions into one number, rather than just storing the lat/lon pair and using something like Euclidean distance?

leitasat8y ago

As the commentators on the website mentioned, this method does not preserve the closeness of the points (when going from 2D to 1D), so it is unclear how can it help the author.

mmalone8y ago

True, but it is much better at preserving closeness than the alternative he mentioned (lexicographical sort).

teddyh8y ago

So how would https://xkcd.com/195/ look using a z-curve instead?

VMG8y ago

I'd imagine it would look less continuous: some of the areas would be ripped apart and spread across the map.

sp3328y ago

On a lark I plugged this into Google https://encrypted.google.com/search?num=20&hl=en&q=https%3A%... and if you peer at the results, you can see that Google is highlighting "Hilbert curve" as if it were matching one of my search terms. I guess that its algorithm has "learned" that Hilbert curve is a synonym for z-order curve?

j / k navigate · click thread line to collapse

34 comments

dweekly8y ago

Here is a writeup on Google's S2 library for considering addressing the surface of the Earth as 1D, using Hilbert Curves.

http://blog.christianperone.com/2015/08/googles-s2-geometry-...

And 2015 HN thread: https://news.ycombinator.com/item?id=10066616

MarkMMullin8y ago

My favorite has always been Serpinski curve - the math behind Philip Jose Farmer's Riverworld :-) Here, you can print one yourself :-) https://www.thingiverse.com/thing:622627

jhj8y ago

It also matters if you are building a static data structure that is queried many times, versus one that needs continual updating.

mcphage8y ago

As the first commenter on the site pointed out, the Hilbert Curve is probably a better choice (https://en.m.wikipedia.org/wiki/Hilbert_curve)

jandrewrogers8y ago

DocSavage8y ago

1 more reply

hex126484308y ago

The big advantage of Z-order curves is that the addressing computation is very cheap, which is why it's used a lot in computer graphics.

leni5368y ago

While I agree that Z-order curves are simpler, but it's fast to calculate Hilbert curves on modern CPUs too. Just to self plug:

https://github.com/leni536/fast_hilbert_curve

I only implemented the index->XY calculation yet. It compiles to 36 instructions without any branches and takes up 86 bytes.

https://github.com/leni536/fast_hilbert_curve/wiki/How-effic...

I think I can apply the same tricks for the inverse function too.

1 more reply

oppositelock8y ago

1 more reply

vanderZwan8y ago

H-curves are probably optimal with regards to locality, but good luck finding an implementation:

http://www.akt.tu-berlin.de/fileadmin/fg34/publications-akt/...

https://www.reddit.com/r/ProgrammerHumor/comments/4xzi9a/oh_...

a_e_k8y ago

It doesn't need to be that complicated at all. The book "Hacker's Delight" gives some nice, fairly simple implementations. Here's one:

http://www.hackersdelight.org/hdcodetxt/hilbert/lams.c.txt

1 more reply

yarg8y ago

Moore curves are also worth considering.

DocSavage8y ago

QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and Skew-Tolerant Space-Filling Curves Shoji Nishimura (NEC Corporation); Haruo Yokota (Tokyo Institute of Technology)

It discusses C-Curve, Z-Curve, and Hilbert curves.

albipenne8y ago

https://redis.io/topics/indexes

zocoi8y ago

Can someone help explaining why this hash method could improve distance calculation for k-NN? What does it improve compared with Geohash or k-d tree structure?

imron8y ago

> What does it improve compared with Geohash

From what I can tell, it's the exact same algorithm used by Geohash.

mmalone8y ago

Yep. Geohash is just a fancy name for a z-curve.

mmalone8y ago

1. A geohash is a z-curve.

AstralStorm8y ago

And in very many or highly sparse dimensions with few or lazy updates, R-tree or derivatives.

minflynn8y ago

mmalone8y ago

If you're tempted to use space filling curves (z-curves, hilbert curves, etc.) you should take a look at simple multi-dimensional data structures as an alternative.

There are also good libraries for multi-dimensional data structures for pretty much any mainstream language.

aeroevan8y ago

AKA a geohash (for the lat/lon example), but can easily be extended to other bounded dimensions.

Joboman5558y ago

leitasat8y ago

As the commentators on the website mentioned, this method does not preserve the closeness of the points (when going from 2D to 1D), so it is unclear how can it help the author.

mmalone8y ago

True, but it is much better at preserving closeness than the alternative he mentioned (lexicographical sort).

teddyh8y ago

So how would https://xkcd.com/195/ look using a z-curve instead?

VMG8y ago

I'd imagine it would look less continuous: some of the areas would be ripped apart and spread across the map.

sp3328y ago

j / k navigate · click thread line to collapse