I feel like 90% of the applications in existence can go so far with a regular RDBMS that they never try out Neo4j... I know that's the case with me. Half the time I think I'd just try throwing Agensgraph[0] at the problem instead of jumping to the community version of Neo4J.
I've used Neo4j in the past and it seemed to be stable and efficient, not sure about how well it scales though.
Before, TitanDB (now JanusDB) in conjunction with the Tinkerpop stack was probably my favorite graph DB/stack, but not sure how seriously Datastacks (who owns TitanDB now) continues developing it. And JanusDB as a project didn't seem very active to me (I could be wrong of course).
That said, you can construct and handle graphs using relational databases, graph databases give you advantages in that they have (often) better indexing (i.e. O(1) lookups of vertices & adjacent edges) and come with graph querying languages (e.g. Gremlin), which make it much easier to work with graphs compared to SQL (you can use recursive CTEs to walk graphs on the database side as opposed the client side but complex queries are hard/impossible to write like that). I've written a graph DB abstraction layer in the past that also supports SQL backends: https://github.com/7scientists/vortex.
Fast forward 3 years and am back in the graph world, this time dealing with money laundering and fraud rings. It stumps me as to why graphs are not used more in the financial world. Financial transactions are no different from social interactions. So concepts such as community detection etc apply to the financial realm as well and in fact we are using such concepts to determine fraud rings. I may get back to neo4j.
Btw, if you've ever stayed at a Marriott (or anyone they now own: Starwood, Hilton, Ritz Carlton) then you've used Neo4j in production (to book that room). If you've ever purchased a flight ticket, then you've used Neo4j in production (over 99% of all fare calculations are done with Neo4j). Etc.
create table room_booking {
roomid,
booking_started,
booking_ended
}
and some simple sql to check if any rows exist within the required date range would have covered it? What other aspects am I not considering?It's an easier cultural change in thinking from rdbms to property graph, but also not a huge improvement in terms of what you can do vs an rdbms. Going with a full semantic graph where the relationship is also represented by a node with unlimited relationships, and moving and thinking in hierarchies and inference, is a complete cultural change with an impressive productivity and capability improvement not possible with rdbms. Allegrograph is a good example of a semantic graph which can handle trillions of triples.
I don't see anything about a semantic graph database that would prevent it from being built on top of a graph database like Neo4j (and we are doing kind of that at my current company).
RDBMS has its strengths but it isn't suitable for every use case.
I've used it in production settings where it was NOT crushing a use case. (A couple small internal HR-related sites that would've been better suited to an RDBMS.) Based on that experience, my impression is that it would take a fairly specific graph-oriented problem to get me seriously thinking about paying the costs associated with something like Neo4J when compared to a traditional RDBMS. (Less mature compared to RDBMS systems, less well understood data model, less common query language, less support from other tooling.... all of which can be quite important to the overall costs of a system.)
The reason I use a graph is for consistency from my product level business logic to my implementation.
Basically, to solve my problem, I started with a set of english statements, which yielded a grammar, that I described as "objects and morphisms," (things and relationships) then implemented it in a graph - put a front end on it and built a product.
Graphs provide coherence to my problem. Could an RDBMS do this? Yes, but not without a complex intermediate query layer. I think of using a graph as analogous to specifying your problem in terms of a functional language instead of imperatively. The reason to do that is because your product is the result of maintaining consistency of an abstraction, like a DSL or a game, instead of just retrieving stored values, documents and their variations.
It's disruptive to a lot of orgs as well, since there is a lot of sunk cost in RDBMS experience, so I think the applications are all net new projects. I don't foresee anyone migrating to one, but I do see a point where majority of new products use one.
We absolutely wouldn't be where we are without our community. From drivers and tooling to the huge amounts of support and goodwill that we see, it's a wonderful ecosystem to be a part of.
Once I learned Cypher and some common graph data modeling constructs, I found I could build more complex applications faster using Neo4j, largely because I found the graph model of my project's domain more intuitive and easier to work with than a relational database model.
Similarly, building and using GraphQL APIs has been a huge productivity win once I figured out how to build GraphQL services.
Of course, when used together, Neo4j and GraphQL have some great synergies, since it's graph all the way down ;-)
It's also worth pointing out that Neo4j has some great GraphQL integrations [1].
Thank you for your insights.
Granted, they won't let you do nearly as much as some advanced graph algorithms, but the ease with which you can use it in your operational data store is amazing. And with proper indexing, I could do a traversal in hundreds of milliseconds.
https://www.postgresql.org/docs/current/static/queries-with....
It's also however a bit of a dead end once you go beyond the basics. The costs of joins get worse the deeper you go, and "hundreds of milliseconds" is at least an order of magnitude slower than what Neo4j would do for you.
Once you take that major performance penalty, and then layer it into more complex graph algorithms or analytics, it gets really, really painful quickly. Granted, you might not notice this if you never needed to go further than 2-3 hops in a graph. But once you start working with graphs you're not going to want to stick to such basics.
More technical detail on the difference between a graph abstraction on top of another database, and a native graph database, can be found here:
So say A points to B you have 3 tables right? Table 1 for A, Table(2) for B and a join table (3) showing that A points to B right? Why would you do that? What's stopping you from having one table that contains A and what A points to? So you only have 2 tables?
What if you have a node that can point to many items, a column can contain a list in postgres, so we can still have one Table containing your node data and a list of items they point to.
I'll concede that graph databases are easier to write query for, most people already struggle with basic SQL, let alone CTE and recursive CTE.
I'm yet to be convinced that a problem can't be reshaped and mapped on a traditional RDMS and yet remain performant.
https://stackoverflow.com/questions/52674380/improving-postg...
Also MS SQL Server supports graphs natively: https://docs.microsoft.com/en-us/sql/relational-databases/gr...
Also, the official site says it's a commercial product. I wonder how many features are supported in community/opensource/free edition ?
I guess if I ever need a graph database again I will probably go for dgraph (although I haven't used it in any production environment) - https://dgraph.io or any other graph database that at least has HA setup without 100k/year bill :)
https://neo4j.com/developer/guide-cloud-deployment/
As others pointed out in other threads, this can be done for free for startups of a certain size (https://neo4j.com/startup-program/), and eval licenses are available (https://neo4j.com/lp/enterprise-cloud/?utm_content=aws-marke...)
It is still a research thing, but I am starting to see occasional papers on inducing relations in graphs using deep learning. If this proves useful, that should help the growth of graph databases and the use of knowledge graphs.
However, as another commenter ‘hardwarsofton’ said, just using Postgres is very often all I need.
I think we're in the early parts of an exciting journey connecting (no pun intended) graphs and AI. I'm personally really excited about connected feature extraction. I wrote a little bit about it here: https://neo4j.com/emil/80-million-series-e/ There's more in depth info here in this graphs & AI overview video from GraphConnect last month: https://neo4j.com/graphconnect-2018/session-topics/?topic=AI...
One would think that Neo4j probably is the most stable one? But it's unclear to me if the full version is open source or not? [1][2]
Has anyone tried several?
Has anyone tried Dgraph? [3]
[1] Community is limited according to Wikipedia
[2] Trying to download enterprise takes me to a "Start a Free 30-Day Trial" page
[3] https://dgraph.io/ ?As a graph database, it has some non-typical tradeoffs. You can't easily discern incoming edges and there's no true node deletion. There's a pretty narrow happy-path where the DB works as advertised/expected, but it's just a fairly young DB from an understaffed startup. Probably worth waiting a year or two for the kinks to be ironed out.
Congrats to Neo4j on the raise! I hope it changes the perception of US VCs w.r.t. graph DBs, who are falling behind the dev enthusiasm and readiness for adoption in this field.
As Jepsen report mentioned, it had identified 23 issues, 19 of which were resolved before the report released and another one right after. Dgraph has gone a long way since v1.0 release in terms of production stability. I'd recommend trying out the latest v1.0.9 release or the upcoming v1.0.10.
Dgraph itself is close to being launched in production at a few very big and well-known companies (that we can't mention publicly yet), who moved away from Neo4j to Dgraph. Needless to say, Dgraph's performance and scalability far exceed any other graph DB in the market.
Dgraph is tackling a lot harder problem of doing distributed joins and traversals, while providing distributed ACID transactions, synchronous replication and linearizable reads. The equivalent of Spanner, which can also do efficient joins (something relational DBs suck at, so technically more complex). There's no graph product out there like this or even a single paper which Dgraph is based on, rather we had to do original research to perfect this technology -- which is why it took time to build and stabilize Dgraph.
Badger, the underlying kv DB, itself was never found to have an issue. It is serving several petabytes of data in production use at various companies. We built Jepsen style bank tests for Badger, which run successfully nightly, and there's an open bounty of $1337 for finding any data loss bugs in Badger.
Dgraph is decently staffed (7 engineers) for a seed-stage startup, but we're definitely hiring and planning to grow in SF. No need to wait, this is the right time to run Dgraph in production.
I quite like it, but I haven't gotten to the 'running in production' part so I haven't experienced what it's like to actually manage or scale it, only its query language and setup.
We wanted to use Neo4j at my last job (in Mexico). However we found the commercial version was prohibitely expensive, and the free version did not work for real life problems.
So I think this is a place where open source alternatives would have been welcomed.And as far as I can remember from a former project the scalability is pretty limited (but this could have changed).
And there is a huge scalable graph stack with Cassandra - Datastax Enterprise Graph, Titan,JanusGraph (where Google is involved), Tinkerpop,etc.
The production readiness of neo4j is something I'm still not quite sure about.
It truly shines as an embedded graph db though. I wonder if there is a Blockchain story around neo4j (as a replacement for leveldb) that makes this more interesting. After all there is a lot of excitement around DAG based blockchain alternatives.
This may help non-graph folks understand the community a bit. Neo4j has a bunch of cool bits, and it's been a pleasure watching them bring two specific "aha!" moments to customers. Our tech helps teams build scalable visual workflows that include visual graph, so we're often brought in near the beginning of a graph project, and have repeatedly seen two situations where a DB at Neo4j's quality shines:
1. Performance: A teams starts using their existing data stores -- SQL, Splunk, etc. They'll get quite far. Often, however, they will hit some query that just cannot perform. E.g., for two bank accounts, all paths between them. For different DBs and workloads, these can be different things.
2. Ease: Asking for something like a 360 view around a device, user, patient, account, etc. is hard in sQL - you don't know what column, table, etc. to look at. Or imagine the above shortest-paths query. Cypher makes writing this stuff EASY, so in a world where a lot of people can barely do SQL, that's a superpower.
Neo4j has been broadening by entering the scaleout world, app dev world, and adding multi-modal & ML capabilities, which are all important things and help grow the eco-system. Congrats again!
What we are seeing here is the 'commodification' of graph, a trend that happens in technology in general. Companies that launched ten years ago, on a massive investment with their own proprietary graph technology - I'm talking the likes of Twitter, Facebook and so on - today the same features could be implemented with a fraction of the investment. They'll do this by leveraging Neo4j.
This funding we'll broaden the reach of graph technology, while reducing the overall cost for individual organisations to adopt. Social networks, recommendation engines, fraud detection systems are all now easily within reach. Check out our own free and open-source recommendation engine, which was built on top of Neo4j, for example: https://github.com/graphaware/neo4j-reco.
We live in exciting times. While the commodification of what we call 'graph 1.0' is in progress, what Tesla's head of A.I. Andrej Karpathy brands "Software 2.0", that is the intersection of machine learning and software development is rapidly picking up pace. We're only at the beginning of the hype cycle on this. And guess what? IT is an established fact, is that graph is playing a central role in this transformation process.
We are proud to say that our organisation is at the forefront of using graph technology to derive insight and meaning from unstructured data - we call this GraphAware Hume. We're really excited about this!
As you can see we're are pretty passionate about graph technology, and Neo4j in particular, and in our opinion we're at the beginning of what is going to be a very transformative adoption. If you're thinking about exploring how graph might fit into _your_ organisation, of course feel free to reach out.
Disclaimer: GraphAware (https://graphaware.com/) is Neo4j's solution partner
Our schema involved taking physical assets/personnel and representing them as different labels: machine, factory, production line, user, usergroup, etc. We then drew complex relationships between different user/groups in the organization and the assets they were responsible for.
At first, we used a relational database, but it soon became difficult to go more granular than simply: user belongs to usergroup, usergroup belongs to client, client has factories, factories have lines, lines have machines.
As many have pointed out here, it's not that you can't do this with non-graph databases, it just requires a more complex query layer. Neo4j allowed us to represent complex business relationships as natural language, and that really helped us as the business scaled.
Java will always be king of enterprise despite all the drama with the future of the JRE.
I used to use OrientDB but moved away due to stability issues (long ago now so hopefully they have that sorted). I also just noticed they’ve been acquired by SAP!
For me graph layouts are conceptually superior vs relational when explaining to non techie users.
Practically however I now stick to Postgres - it’s ‘good enough’ (for what I’m doing) and has a heap of benefits in and of itself.
I looked st Agensgraph but I couldn’t get enough info on it, plus it is a custom version of Postgres (not a plug-in) and i think they recently switched to AGPL which makes it overall less exciting to investigate.
I know there are places where having a real graph db helps but I’ve not personally hit those scenarios yet.
I'm also interested in peopels opinions of Neptune in that domain.
Mostly for medical reasoning on RDF OWL ontologies such as "Ontology Development and Debugging in Protégé using the OntoDebug Plugin"[1]