The flexibility in defining rules through tuples helped us iterate rapidly on new product features. We used self-hosted Ory Keto [0] instances as the implementation, though we would have preferred a managed solution. We were checking out Auth0 Fine Grained Authorization [1] but it was still in Alpha back then.
[0]: https://www.ory.sh/keto/ [1]: https://auth0.com/developers/lab/fine-grained-authorization
We completely agree here, which is why we initially started out with our managed cloud offering, Warrant Cloud[1]. While Zanzibar is powerful, operating it with solid latency/availability can be quite challenging.
Ory does have a managed service offering now for Ory Keto as well!
Now, a genuine question: why try to shoehorn a freeform graph (because the list of relationships is not hardcoded) into a relational DB instead of using a graph DBMS like Neo4j, Apache Jena (Fuseki) etc. From looking at the source code briefly [1], I didn’t see any extreme SQL optimizations. This indicates to me that Warrant would either support a very limited set of query types, or be very slow on quite a few types of them. Also see “billion triple challenge” in the academia around this.
Good luck with your startup!
[0]: https://www.w3.org/TR/rdf11-primer/
[1]: https://github.com/warrant-dev/warrant/tree/main/pkg/authz/o...
I hate it. It's extremely expensive. It's slow. Very slow. It only recently had multiple databases per instance. It doesn't support per database encryption. Did I mention it's slow?
We also looked at the ongdb effort, but that went offline all of the sudden due to licensing issues. Now it's back but they reset (?) the version number. Confusing. Also, that one is built in version 3-ish. So no multi-db. While you can spin up multiple instance (it's free?, it's still Java, i.e. slow and eats memory.
1) It has no support for subgraph queries. In other words, you can't run a query on a graph and have the query result be a graph too. Instead, you will get a tabular result set. In SPARQL-based systems, you can run a 'CONSTRUCT' query. Very useful if you want to process the results by other parts of the code that also expect a graph (composability). See [1] and [2] if you want to take SPARQL for a spin.
2) It has no support for a standard graph data format. Their blog had some posts about using CSVs but they are a tabular data format, which means that some acrobatics are needed to extract a graph from CSV (actually, two CSVs) and none of this would be standard. Also some attempts to fit a graph peg into a tree-shaped hole (JSON, XML). To my knowledge, RDF is the only widely used standard to actually represent graphs. Unfortunately, there is a lot of confusion around RDF because (a) RDF is actually just a model and there are multiple file formats – I recommend Turtle, and (b) RDF has a semantic web heritage – forget semantic web and just use a graph data format.
But I know that industry is most familiar with Neo4j, that's why I mentioned it. To my knowledge, Stardog is one of the most advanced and performant systems (with on-prem deployment) but is very expensive. Amazon Neptune and Azure Cosmos are cloud-only, which is a hindrance for many projects. Bottom line is that graph DBMSs have a long way to go and more interest from the community is needed to motivate more dev effort.
P.S. For dense graphs, a graph DBMS may not be the best solution. Graph DBMSs also lose their appeal if your queries are not traversal-heavy.
My takeaway: https://www.youtube.com/watch?v=JNC1CpJQxzg
The engineering quality along with documentation left a pretty bad taste.
Tho sometimes some aspects being really primitive were helpful for getting out of trouble.
How does Warrant deal with consistency?
This goes for every part of their stack. As a result, things like Colossus, BigTable, and Spanner effectively act like force multipliers for their engineers, because they provide the guarantees they can't get elsewhere. The fact other people at other random companies can't do that? Not their problem in the slightest, actually.
When App Engine launched, that was great for me because I could write internal tools that were mostly off the treadmill. Unless you used one of App Engine's less-used API's (which themselves eventually got deprecated), your more obscure team-specific services could keep running.
So, lots of great technology is not necessarily great for productivity. I don't know what's happened since. I expected that launching Cloud would result in more mature infrastructure because external customers won't tolerate churn as much. I guess it's sort of true?
Most of that stuff is available to external users in Google Cloud; so why isn’t Google Cloud more popular? I don’t have hard numbers handy, but it seems to me that GCP is behind both AWS and Azure in terms of dev mindshare.
GCP has plenty of great tools, but it can also be quite awkward to use, and it’s lacking some useful stuff like lightweight edge functions.
At Warrant, we're experimenting with allowing customers to maintain searchable metadata in Warrant and exposing a "query" API[1] that can automatically hydrate objects based on that metadata.
When you access one resource it's fine to a do a roundtrip, but with listing, filtering, searching if you don't join at query time it doesn't work. I'm not entirely sure how they achieve it and I found it annoying that it's never mentioned because it's very common.
How would you compare Warrant to other Zanzibar (ZaaS?) offerings? Particularly Ory and Authzed/SpiceDb.
As a developer of a tiny internal webapp - this is fascinating to read! I like to keep things as simple as possible, but as with anything our scope and use cases have grown over time.
Our authzn can handle some of this stuff - our rules, built atop our org's existing IAM, are very similar to these directed relationship tuples - but as we need to grow that out any further I'm excited to look into which aspects of ReBaC we're still missing.
Thanks for the link!
They've posted a number of interesting articles on the topic here, such as this one listing competing implementations (but 2y old): https://authzed.com/blog/zanzibar-implementations
If so, how did you evaluate them relative to each other and/or building yourselves?
Are there good examples of similar applications of data modules for similarly niche use cases? I get that there are obviously endless data models, but this seems to extend beyond that into a more integrated concept and I don’t quite know why that seems to be the case.
So AWS groups do not nest, but Zanzibar groups do. In Zanzibar a relation on an object (an implicit set of users) can be the subject of a rule; you can define “users who have editor permission on an object also have viewer permission” in one rule. This isn’t possible in AWS; there is no way to reference the set of principals who are allowed a particular action or policy.
I think that AWS policies tend to have a lot of duplicate rules because of lacking recursion. Zanzibar rules should be easier to maintain and audit.
AWS IAM is also just quite “hairy” from gradual evolution over the years. On the other hand, Zanzibar has a clean model.
It would be nice to have a compiler that would emit AWS IAM Policies given Zanzibar-style rule tuples.
Zanzibar is a Google-internal implementation of the concepts outlined in this paper, focused on managing authorization as a function of relationships between objects.
AWS IAM is primarily for AAA services with AWS, though you can use it with AWS Identity Center to provide SSO to other systems via IAM.
Zanzibar is overkill for the majority of needs and introduces far too much complexity. It is the solution that covers scenarios of the likes in which you will never see. You will never grow into needing them, either. It is the pinnacle of over engineered software. The reason why people form companies offering it as a solution is to try to recover hundreds of hours of effort cost on something they didn't need.
And the additional element added to the tuple is reminiscent of quads, also in heavy use in RDF implementations or similar graph databases.