What's the advantage of Uber's approach over any of this. Even my primitive geohash solution worked pretty nicely and you can implement it on just about any type of DB. I had a simple algorithm to cover any shape with geohashes of a certain size as well as a quick way to generate a polygon for a circle. The two combined allowed me to do radius searches for any shape overlapping or contained by the circle with simple terms queries on the geohash prefixes. My main headache was keeping the number of terms in a query (i.e. number of geohashes) to a reasonable size (below 1024, which if I recall was a the default limit).
What I get from their explanation is that hexagon is a better shape for map grids because they are the most complex shape that can tesselate (the other two are triangles and squares). As they are more close to a circle, distances within a cell are more stable, also computing the distance from a cell center to its neighbours is stable in hexagons as well.
I think the reason hex is not more common is that subdivisions are hard to create compared to triangles or squares. Uber solved this by subdividing in 7 smaller hex and tilting it so they cover the bigger shape with some small overlap.
Also a big problem is distortion, I never thought this would be that huge of a problem but it makes sense. They go into a lot of the details later on the same video.
[Penrose tiling intensifies]
But if you look at have they overlay London, you get quite a split higher up and two next to each other don’t look like they are and so I can see where the number of terms would get big.
The new ElasticSearch implementation is now the default and I think they are deprecating the prefix versions like geohash.
Too bad their stuff is not embeddable into an app. Know of lib for this?
Update with link: https://lucene.apache.org/core/7_1_0/core/org/apache/lucene/...
Representation systems for geospatial data models is an amazingly deep theoretical rabbit hole. Common systems are almost always optimized for presentation as most were originally designed for cartographic use cases. If you were looking at representation systems optimized for fast, scalable geospatial analytics, for example, you'd use some type of 3-space embedding representation. There is a lot of diversity.
H3 also has efficient means for finding a cell’s neighbors, and comes with some nice algorithms - like the “compact” fill. See https://uber.github.io/h3/#/documentation/overview/use-cases
The hierarchical search data is built into a grid’s ID in both H3 and S2, which helps when comparing ID’s to see how close they are to each other.
For visualization, I prefer the way H3 looks over S2. While that’s just an opinion, H3 grid is exposed directly to drivers and in lots of tools.
I work for Uber and have used H3, but don’t work on the library.
Ah, yes this makes a lot of sense. The video in the link actually does have a nice comparison at 17:30, where this is called out. It seems to me to be the most compelling argument for hexagons.
I use this library every day and absolutely love it.
Bear in mind that H3 only has 16 levels of resolution, so traversing from top level down to the square-metre level is still a constant time operation i.e. O(16)
This post goes some way towards explaining why hexagons:
* "minimize the quantization error introduced when users move through a city"
* "allow us to approximate radiuses easily"
* non-hexagonal regions (such as postal code boundaries) "have unusual shapes and sizes which are not helpful for analysis, and are subject to change"
My favorite quote in the video
S2 cells distort quite heavily depending on which part of the globe you're mapping.
Then we can say if you need a package collecting at address X, there are usually Y available couriers within Z distance. Combine that with a road mapping/traffic API and you're done.
https://github.com/uber/h3/issues/258
(Note: by "massively" I mean pushing it to O(1) for the fundamental coordinate operations. So, perhaps read that as "most massively".)
Which is it - an inaccurate image, and the system is really hexagon-based, or does the system also use pentagons? My guess is that it is actually all hexagon-based since the overlap between granularity is already only approximate.
Which isn't to say that you couldn't tile the planet with triangles, but they point out that the consistent relationship between neighboring hexagon tiles is useful:
>Using a hexagon as the cell shape is critical for H3. As depicted in Figure 6, hexagons have only one distance between a hexagon centerpoint and its neighbors’, compared to two distances for squares or three distances for triangles. This property greatly simplifies performing analysis and smoothing over gradients.
As you noticed, you can't divide a hexagon into 7 regular hexagons either. But it's apparently close enough:
>H3 supports sixteen resolutions. Each finer resolution has cells with one seventh the area of the coarser resolution. Hexagons cannot be perfectly subdivided into seven hexagons, so the finer cells are only approximately contained within a parent cell.
Now that image sensors and e.g. smartphone displays are very high resolution, and most content ends up going through multiple resampling routines on its way from capture -> storage/processing -> display, I would love to see people start to experiment with hexagon-grid cameras and displays. The slightly more complicated resampling wouldn’t really be a big deal for modern DSP hardware, and the visual output could be significantly better for the same pixel count.
It should be possible to turn this projection into quite a nice 3D printable model. Challenging, though. Probably one could try to split the sphere into equal hexagons for printing and then assemble.
(I mention it mostly because I think it's an interesting little mathematical factoid.)
This can be nicer in some cases: the edge case your hexagon-grid algorithms have to deal with is having a hexagon with one of the same neighbors twice, instead of needing to worry about pentagons per se.
I had analogous experiences every time I asked a question there. One would ask a very clear question like, say, "how do I print to stdout in C?" And the first or second answer you get is inevitably about taking input from stdin. Or polymorphism.
The nice thing about S2 is that is subdivides cleanly: A square can be composed of smaller squares while hexagons can't be. This property makes S2 much more broadly useful for a bigger range of applications.
In H3, each hierarchical level of hexagon doesn't fit cleanly in the one below. For Uber's uses, this is acceptable because hexagons have more uniform adjacency but the "zoom in and out" math is pretty gnarly.
But even S2 had the funkyness of first mapping a sphere to a cube. They're both fairly interesting to read up on.
square and triangle are weird for distance, maybe. but they can all hold precise subdivisions of the shape inside each other. hexagon "bleed", so why exactly they boast so many times in the article that h3 is so great for nesting different details levels? even their diagrams show very bad bleeding (worse than it should be if optimized) and they never mention it on this summary.