I don't get to do geospatial work as much anymore, but I would have killed for this just a year ago.
It will be great with some more options in this space, especially if it makes a smooth transition from single-node/local interactions to multi-node scale-out.
My bet is most of actually useful spatial ST_ functions are not implemented in this one, as they are not in the DuckDB offering.
In terms of number of functions PostGIS is still the leader, but for analytical functions (spatial relationships, distances, etc) having those in place in these systems is important. DuckDB started this but this has a spatial focused engine. You can use the two together, PostGIS for transactional processing and queries, and then SedonaDB for processing and data prep.
A combination of tools makes a lot of sense here especially as the data starts to grow.
Postgres made gigantic leaps in recent years - both in performance and feature-set. I don't think ever comparing the new contenders with daddy is fair. But then there are the DuckDB advocates who claim it pioneered spatial, which is so much not true.
Postgres is amazing system, which is also available free. We don;t have too many of these, and too many aging that well.
For example, if i wanted to define a 4d region called (fish, towel, mouse, alien) and there were floats for each of fish/towel/mouse/alien?
{ "type": "EngineeringCRS", "name": "Fish, Towel, Mouse", "datum": {"name": "Wet Kitty + Mouse In Peril"}, "coordinate_system": { "subtype": "Cartesian", "axis": [ {"name": "Fish", "abbreviation": "F", "direction": "east"}, {"name": "Towel", "abbreviation": "T", "direction": "north"}, {"name": "Mouse", "abbreviation": "M", "direction": "up"}, ] } }
(Subject to the limitations of PROJJSON, such as a 4D CRS having a temporal axis and a limited set of acceptable "direction" values)
I thought Apache Sedona is implemented in Java/Scala for distributed runtimes like Spark and Flink. Wouldn't Rust tooling for interactive use be built atop a completely different stack?
Surely they're the same? Two sedona projects is one thing, but two apache sedona projects is sheer madness?
1. The latitude/longitude ordering for points differs from PostGIS and most standard geospatial libraries, which creates friction due to muscle memory.
2. Anecdotal: spatial joins haven't matched PostGIS performance for similar operations, though this may vary by use case and data size.
3. The spatial extension has a backlog of long-standing GitHub issues.
What am I missing? The api even looks the same.
It comes a disappointment for me that SedonaDB hasn’t adopted a similar approach.
Apache stack provides everything needed, but for small things I would not prefer SQL exactly
While PostGIS is often used for spatial analytics because of its rich spatial function coverage, it is fundamentally a transactional database. This design makes it less suited for analytical query performance, and including it directly in SpatialBench would risk claims of being an “apples-to-oranges” comparison. That’s why we exclude PostGIS from the published benchmark results.
That said, we do continuously validate against PostGIS. For every single function in SedonaDB, we maintain an automated PyTest benchmark framework (https://github.com/apache/sedona-db/tree/main/benchmarks) that compares both speed and correctness against DuckDB and PostGIS. This ensures we catch regressions early and guarantees correctness. You can even run these benchmarks yourself to see how SedonaDB performs. It is often extremely fast in practice.
SedonaDB currently supports SQL, Python, R, and Rust APIs. We can support APIs for other languages in the future. That's another nice part about Rust. There are lots of libraries to expose other language bindings to Rust projects.
From the README:
> Update (August 2024): GeoPolars is blocked on Polars supporting Arrow extension types, which would allow GeoPolars to persist geometry type information and coordinate reference system (CRS) metadata. It's not feasible to create a geopolars. GeoDataFrame as a subclass of a polars. DataFrame (similar to how the geopandas. GeoDataFrame is a subclass of pandas.DataFrame) because polars explicitly does not support subclassing of core data types.
What does it do besides being written in Rust?
There are other good alternatives, such as GeoPandas and DuckDB Spatial. SedonaDB has Python/SQL APIs and is very fast. New features like full raster support and compatibility with lakehouse formats are coming soon!
As someone who has had to use geopandas a lot, having something which is up to an order of magnitude faster is a real dream come true.