H3: Uber’s Hexagonal Hierarchical Spatial Index (2018) (opens in new tab)

(eng.uber.com)

262 pointsfelipap6y ago74 comments

74 comments

59 comments · 22 top-level

Mr_P6y ago· 9 in thread

I'd be curious to see a more thorough comparison to S2, which IMHO seems simpler (it's just quadtrees) and likely faster (it supports O(1) lookup vs. needing a hierarchical search).

jandrewrogers6y ago

It is complicated, and it depends on the use case. There are roughly three dimensions to what you are optimizing the representation for: presentation, computational geometry, and decomposition (sharding). S2 and H3 are both fundamentally cartography-driven representations, primarily optimizing for presentation. S2 focuses a bit more on sharding and H3 a bit more on computational geometry, there is quite a bit of literature on the tradeoffs of their characteristic designs. If the core application is not presentation driven, such as pure spatiotemporal analytics, neither of these representations are good choices.

Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.

nwlieb6y ago

What is an example of a 3-space embedding or interesting literature? I'm having difficulties googling the term.

2 more replies

pininja6y ago

H3 and S2 are kinda similar, except by using hexagons, H3 grid centroids are equidistant - rectangles have different distances from center to center.

H3 also has efficient means for finding a cell’s neighbors, and comes with some nice algorithms - like the “compact” fill. See https://uber.github.io/h3/#/documentation/overview/use-cases

The hierarchical search data is built into a grid’s ID in both H3 and S2, which helps when comparing ID’s to see how close they are to each other.

For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.

I work for Uber and have used H3, but don’t work on the library.

Mr_P6y ago

> For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.

Ah, yes this makes a lot of sense. The video in the link actually does have a nice comparison at 17:30, where this is called out. It seems to me to be the most compelling argument for hexagons.

kndjckt6y ago

The critical advantage of h3 vs. s2 is that all neighbors are equidistant from the central cell. Also, the implementation of the projection means there is less distortion than s2.

I use this library every day and absolutely love it.

stefco_6y ago

This is the same reason astronomers like HEALPix tiling for skymaps [0]. Equal areas-per-pixel (important for integrating over areas) with straightforward spherical harmonics calculations and hierarchical extensions that can store images at multiple resolutions by subdividing base cells (LIGO uses one such hierarchical strategy for our low-latency gravitational wave source direction estimates [1]).

[0] https://healpix.sourceforge.io

[1] https://arxiv.org/abs/1508.03634

dpflan6y ago

Thanks for sharing. What do you use it for? Did you use S2 previously?

seanhandley6y ago

>it supports O(1) lookup vs. needing a hierarchical search

Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)

erichocean6y ago

Seeing as Uber used to use S2, presumably it was worse for their needs.

jillesvangurp6y ago· 6 in thread

I did some indexing using elasticsearch and some home grown stuff based on geohashes about seven years ago. At the time Elasticsearch was just adding support for geoshape indexing as well. Initially this was also based on geohashes. Later they added proper quad tree support (instead of indexing the geohash as a term), and recently they revamped the implementation using BKD trees. The current implementation is way faster and scalable than what they had a few years back and allegedly quite nice for handling complex GIS data.

What's the advantage of Uber's approach over any of this. Even my primitive geohash solution worked pretty nicely and you can implement it on just about any type of DB. I had a simple algorithm to cover any shape with geohashes of a certain size as well as a quick way to generate a polygon for a circle. The two combined allowed me to do radius searches for any shape overlapping or contained by the circle with simple terms queries on the geohash prefixes. My main headache was keeping the number of terms in a query (i.e. number of geohashes) to a reasonable size (below 1024, which if I recall was a the default limit).

mtrovo6y ago

They go into some detail on this talk https://youtu.be/ay2uwtRO3QE?t=712.

What I get from their explanation is that hexagon is a better shape for map grids because they are the most complex shape that can tesselate (the other two are triangles and squares). As they are more close to a circle, distances within a cell are more stable, also computing the distance from a cell center to its neighbours is stable in hexagons as well.

I think the reason hex is not more common is that subdivisions are hard to create compared to triangles or squares. Uber solved this by subdividing in 7 smaller hex and tilting it so they cover the bigger shape with some small overlap.

Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.

Pamar6y ago

Which is why, btw, Hexagons have been used for decades in wargames (and - to a more limited extent, in other types of boardgames).

twic6y ago

> hexagon is a better shape for map grids because they are the most complex shape that can tesselate

[Penrose tiling intensifies]

2 more replies

AlchemistCamp6y ago

The Techzing podcast talked about the origin of this several years ago, back when one of the hosts had a critical role in its creation. Good stuff.

sroussey6y ago

geohashes are cool, and lopping off a letter reduces precision.

But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.

The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.

Too bad their stuff is not embeddable into an app. Know of lib for this?

jillesvangurp6y ago

This is all part of Lucene and not specific to Elasticsearch as far as I understand.

Update with link: https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/...

1 more reply

evrydayhustling6y ago· 5 in thread

Why does each cell contain seven finer cells instead of six, resulting in imperfect containment?

wlesieutre6y ago

You can't divide a hexagon into 6 regular hexagons, that would mean using triangular cells to subdivide the hexagon.

Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:

>Using a hexagon as the cell shape is critical for H3. As depicted in Figure 6, hexagons have only one distance between a hexagon centerpoint and its neighbors’, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.

As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:

>H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution. Hexagons cannot be perfectly subdivided into seven hexagons, so the finer cells are only approximately contained within a parent cell.

evrydayhustling6y ago

Thanks! It turns out I am bad at hexagons.

flamtap6y ago

I wonder if it has something to do with the number of hexes it takes to cover a sphere with? It seems to me like the larger hexes/pentagons are mostly just illustrative borders.

pfortuny6y ago

They are not all hexagons on the top picture: there is no hexagonal polyhedron (with regular hexagons). You can see pentagons here and there.

the_seraphim6y ago

because you can't do that, that's not how shapes work.

traverseda6y ago· 3 in thread

>Hexagons are just pentagons without one side

My favorite quote in the video

kjeetgill6y ago

Don't they have one more side?

Yolta6y ago

They do.

traverseda6y ago

Oops, typed that backwards

hamandcheese6y ago· 3 in thread

How does this compare to S2 from google? It is not mentioned in the post at all.

http://s2geometry.io/

seanhandley6y ago

You get more aesthetic visuals from H3.

S2 cells distort quite heavily depending on which part of the globe you're mapping.

phkahler6y ago

I think this is the real answer. I've like hex tiles for a long time, but they don't nest properly like squares in a quadtree. I think someone at Uber decided to "make it work" because it looks cool. They traded one set of problems for another, and I think from a technical point of view they made a poor choice.

1 more reply

snek6y ago

There is a deep comparison in the video in the post.

bayesian_horse6y ago· 3 in thread

I love the aesthetics of hexagons.

It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.

jpab6y ago

You can't make a sphere entirely out of hexagons; the math doesn't work [1]. In the article they note that they include 12 pentagons too, to solve this problem. Neatly, they arrange to have all the pentagons in areas of water where they presumably won't have to analyse traffic patterns for a while.

(I mention it mostly because I think it's an interesting little mathematical factoid.)

[1] https://math.stackexchange.com/q/2121175

jacobolus6y ago

One thing you can do though is tile an octahedron by just hexagons. At each of the 6 corners you end up with 2 hexagons which border each-other along 2 edges (instead of the usual 1). If you blow this octahedron up into a sphere those hexagons appear to be pentagons, because two of their edges are colinear (i.e. the same great-circle arc).

This can be nicer in some cases: the edge case your hexagon-grid algorithms have to deal with is having a hexagon with one of the same neighbors twice, instead of needing to worry about pentagons per se.

1 more reply

na856y ago

That link is a great example of why I stopped using stack exchange. The OP asked a question about tiling spheres with pentagons and the third response is basically "you can tile a sphere if it's not a sphere."

I had analogous experiences every time I asked a question there. One would ask a very clear question like, say, "how do I print to stdout in C?" And the first or second answer you get is inevitably about taking input from stdin. Or polymorphism.

1 more reply

thstart6y ago· 3 in thread

H3 is working for the specific Ubers’ needs. Not much useful as S2.

kjeetgill6y ago

I wish you weren't downvoted, this is exactly right.

The nice thing about S2 is that is subdivides cleanly: A square can be composed of smaller squares while hexagons can't be. This property makes S2 much more broadly useful for a bigger range of applications.

In H3, each hierarchical level of hexagon doesn't fit cleanly in the one below. For Uber's uses, this is acceptable because hexagons have more uniform adjacency but the "zoom in and out" math is pretty gnarly.

But even S2 had the funkyness of first mapping a sphere to a cube. They're both fairly interesting to read up on.

mtrovo6y ago

Could you give more details? After reading the article I think the main reason is the subdivision algorithm which for S2 is easier but for everything else hexagon seems like a better approach.

HelloNurse6y ago

Subdividing tiles can be more important than everything else: knowing in which tile a point lies is a fundamental operation for a spatial indexing system and it has to be very easy and efficient.

1 more reply

seanhandley6y ago· 2 in thread

I work for a delivery company. We currently use H3 to help calculate ETAs in realtime.

lvh6y ago

Does that work? I’d expect you want to do a fairly traditional road-based ETA? Otherwise you’ll be wildly off if you’re in a cul de sac next to a major highway or on said highway, or in massive traffic vs not at all.

seanhandley6y ago

It's more to figure out how many couriers are in a given cell at a given time, and track that data over time so it can be forecasted.

Then we can say if you need a package collecting at address X, there are usually Y available couriers within Z distance. Combine that with a road mapping/traffic API and you're done.

nexuist6y ago· 1 in thread

Is UberCon going on or something? Why so many releases in the past few days? I'm not complaining; just curious.

kyruzic6y ago

This article is over a year old.

Animats6y ago· 1 in thread

There's a lot to be said for a hexagonal grid. I looked at this once for a collision detection system. It's convenient to get rid of the special cases needed where four cells meet. You have to invent some new bookkeeping, but it can be worth it.

jacobolus6y ago

The aliasing / moiré artifacts are also dramatically less objectionable with hexagonal grids.

Now that image sensors and e.g. smartphone displays are very high resolution, and most content ends up going through multiple resampling routines on its way from capture -> storage/processing -> display, I would love to see people start to experiment with hexagon-grid cameras and displays. The slightly more complicated resampling wouldn’t really be a big deal for modern DSP hardware, and the visual output could be significantly better for the same pixel count.

starpilot6y ago· 1 in thread

Python bindings on Windows working yet?

seanhandley6y ago

David Ellis is working on this https://github.com/dfellis/h3-py/pull/1

Cactus20186y ago

Previously https://news.ycombinator.com/item?id=16135302

kiwidrew6y ago

Uber really likes hexagons. Their internal name for driver promotions is even "hexcentive"! I've long wondered what compelled Uber to use hexagons to define their various promotion boundaries, because it's not a very user-friendly choice -- the lines on a map look very jagged, for instance.

This post goes some way towards explaining why hexagons:

* "minimize the quantization error introduced when users move through a city"

* "allow us to approximate radiuses easily"

* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"

dang6y ago

Thread from last year: https://news.ycombinator.com/item?id=17417545

scrappyjoe6y ago

If anyone wants to use this in R, the h3-r package works quite nicely. It’s in C which, in my tests, is an order of magnitude faster than the v8 implementation. We use it to chop up spatial data into equally sized chunks that we can use as observations for regressions and the like. https://github.com/crazycapivara/h3-r

no_identd6y ago

I hate Uber, but I now opened this issue suggesting a way to massively increase the performance of this anyway:

https://github.com/uber/h3/issues/258

(Note: by "massively" I mean pushing it to O(1) for the fundamental coordinate operations. So, perhaps read that as "most massively".)

lacker6y ago

I might be the only one pedantically annoyed here, but... the very first image on this page contains a mix of hexagons and pentagons.

Which is it - an inaccurate image, and the system is really hexagon-based, or does the system also use pentagons? My guess is that it is actually all hexagon-based since the overlap between granularity is already only approximate.

drewm19806y ago

I wonder why healpix from NASA wasn't an option for them. Maybe the nonpermissive license...

jeanlucas6y ago

Second time seeing it, would love to use it just for fun, but no idea what to do

peter_d_sherman6y ago

Utterly Brilliant!

ggg36y ago

did not look at the code yet, but the algo for subdiving might be a nightmare.

square and triangle are weird for distance, maybe. but they can all hold precise subdivisions of the shape inside each other. hexagon "bleed", so why exactly they boast so many times in the article that h3 is so great for nesting different details levels? even their diagrams show very bad bleeding (worse than it should be if optimized) and they never mention it on this summary.

ThouYS6y ago

Beautiful blog post!

j / k navigate · click thread line to collapse

74 comments

59 comments · 22 top-level

Mr_P6y ago· 9 in thread

I'd be curious to see a more thorough comparison to S2, which IMHO seems simpler (it's just quadtrees) and likely faster (it supports O(1) lookup vs. needing a hierarchical search).

jandrewrogers6y ago

nwlieb6y ago

What is an example of a 3-space embedding or interesting literature? I'm having difficulties googling the term.

2 more replies

pininja6y ago

H3 and S2 are kinda similar, except by using hexagons, H3 grid centroids are equidistant - rectangles have different distances from center to center.

H3 also has efficient means for finding a cell’s neighbors, and comes with some nice algorithms - like the “compact” fill. See https://uber.github.io/h3/#/documentation/overview/use-cases

The hierarchical search data is built into a grid’s ID in both H3 and S2, which helps when comparing ID’s to see how close they are to each other.

For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.

I work for Uber and have used H3, but don’t work on the library.

Mr_P6y ago

> For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.

Ah, yes this makes a lot of sense. The video in the link actually does have a nice comparison at 17:30, where this is called out. It seems to me to be the most compelling argument for hexagons.

kndjckt6y ago

The critical advantage of h3 vs. s2 is that all neighbors are equidistant from the central cell. Also, the implementation of the projection means there is less distortion than s2.

I use this library every day and absolutely love it.

stefco_6y ago

[0] https://healpix.sourceforge.io

[1] https://arxiv.org/abs/1508.03634

dpflan6y ago

Thanks for sharing. What do you use it for? Did you use S2 previously?

seanhandley6y ago

>it supports O(1) lookup vs. needing a hierarchical search

Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)

erichocean6y ago

Seeing as Uber used to use S2, presumably it was worse for their needs.

jillesvangurp6y ago· 6 in thread

mtrovo6y ago

They go into some detail on this talk https://youtu.be/ay2uwtRO3QE?t=712.

Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.

Pamar6y ago

Which is why, btw, Hexagons have been used for decades in wargames (and - to a more limited extent, in other types of boardgames).

twic6y ago

> hexagon is a better shape for map grids because they are the most complex shape that can tesselate

[Penrose tiling intensifies]

2 more replies

AlchemistCamp6y ago

The Techzing podcast talked about the origin of this several years ago, back when one of the hosts had a critical role in its creation. Good stuff.

sroussey6y ago

geohashes are cool, and lopping off a letter reduces precision.

But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.

The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.

Too bad their stuff is not embeddable into an app. Know of lib for this?

jillesvangurp6y ago

This is all part of Lucene and not specific to Elasticsearch as far as I understand.

Update with link: https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/...

1 more reply

evrydayhustling6y ago· 5 in thread

Why does each cell contain seven finer cells instead of six, resulting in imperfect containment?

wlesieutre6y ago

You can't divide a hexagon into 6 regular hexagons, that would mean using triangular cells to subdivide the hexagon.

Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:

As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:

evrydayhustling6y ago

Thanks! It turns out I am bad at hexagons.

flamtap6y ago

I wonder if it has something to do with the number of hexes it takes to cover a sphere with? It seems to me like the larger hexes/pentagons are mostly just illustrative borders.

pfortuny6y ago

They are not all hexagons on the top picture: there is no hexagonal polyhedron (with regular hexagons). You can see pentagons here and there.

the_seraphim6y ago

because you can't do that, that's not how shapes work.

traverseda6y ago· 3 in thread

>Hexagons are just pentagons without one side

My favorite quote in the video

kjeetgill6y ago

Don't they have one more side?

Yolta6y ago

They do.

traverseda6y ago

Oops, typed that backwards

hamandcheese6y ago· 3 in thread

How does this compare to S2 from google? It is not mentioned in the post at all.

http://s2geometry.io/

seanhandley6y ago

You get more aesthetic visuals from H3.

S2 cells distort quite heavily depending on which part of the globe you're mapping.

phkahler6y ago

1 more reply

snek6y ago

There is a deep comparison in the video in the post.

bayesian_horse6y ago· 3 in thread

I love the aesthetics of hexagons.

It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.

jpab6y ago

(I mention it mostly because I think it's an interesting little mathematical factoid.)

[1] https://math.stackexchange.com/q/2121175

jacobolus6y ago

1 more reply

na856y ago

1 more reply

thstart6y ago· 3 in thread

H3 is working for the specific Ubers’ needs. Not much useful as S2.

kjeetgill6y ago

I wish you weren't downvoted, this is exactly right.

But even S2 had the funkyness of first mapping a sphere to a cube. They're both fairly interesting to read up on.

mtrovo6y ago

Could you give more details? After reading the article I think the main reason is the subdivision algorithm which for S2 is easier but for everything else hexagon seems like a better approach.

HelloNurse6y ago

Subdividing tiles can be more important than everything else: knowing in which tile a point lies is a fundamental operation for a spatial indexing system and it has to be very easy and efficient.

1 more reply

seanhandley6y ago· 2 in thread

I work for a delivery company. We currently use H3 to help calculate ETAs in realtime.

lvh6y ago

seanhandley6y ago

It's more to figure out how many couriers are in a given cell at a given time, and track that data over time so it can be forecasted.

Then we can say if you need a package collecting at address X, there are usually Y available couriers within Z distance. Combine that with a road mapping/traffic API and you're done.

nexuist6y ago· 1 in thread

Is UberCon going on or something? Why so many releases in the past few days? I'm not complaining; just curious.

kyruzic6y ago

This article is over a year old.

Animats6y ago· 1 in thread

jacobolus6y ago

The aliasing / moiré artifacts are also dramatically less objectionable with hexagonal grids.

starpilot6y ago· 1 in thread

Python bindings on Windows working yet?

seanhandley6y ago

David Ellis is working on this https://github.com/dfellis/h3-py/pull/1

Cactus20186y ago

Previously https://news.ycombinator.com/item?id=16135302

kiwidrew6y ago

This post goes some way towards explaining why hexagons:

* "minimize the quantization error introduced when users move through a city"

* "allow us to approximate radiuses easily"

* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"

dang6y ago

Thread from last year: https://news.ycombinator.com/item?id=17417545

scrappyjoe6y ago

no_identd6y ago

I hate Uber, but I now opened this issue suggesting a way to massively increase the performance of this anyway:

https://github.com/uber/h3/issues/258

(Note: by "massively" I mean pushing it to O(1) for the fundamental coordinate operations. So, perhaps read that as "most massively".)

lacker6y ago

I might be the only one pedantically annoyed here, but... the very first image on this page contains a mix of hexagons and pentagons.

drewm19806y ago

I wonder why healpix from NASA wasn't an option for them. Maybe the nonpermissive license...

jeanlucas6y ago

Second time seeing it, would love to use it just for fun, but no idea what to do

peter_d_sherman6y ago

Utterly Brilliant!

ggg36y ago

did not look at the code yet, but the algo for subdiving might be a nightmare.

ThouYS6y ago

Beautiful blog post!

j / k navigate · click thread line to collapse