What is behind the thought that graph databases are going to grow so much in the next few years? To me they've always had a niche use... Are they really going to be ubiquitous (like this funding seems to assume?)
As for those specific figures, I'm guessing there's enough wiggle room in "data and analytics innovations" (emphasis mine) to find or project almost any trend one wishes. What are data analytics innovations? Why, it's the set of things that will see 80% use of graph technologies! "Graph technologies" is also so potentially-vague that it could plausibly be 100% of almost anything related to software.
Relational data may be a hassle but its a hassle you end up having to deal with anyway at some point.
I can see a graph database as being a useful place to stash a ton of shitty data as an initial place to start an ETL but I can't imagine using it as a system of record except in very limited situations.
The opportunity I understood after using Neo was the big product play would be a kind of mental shift for enterprise data analyst users whose jobs exist in excel/powerbi today with power users using Cognos, and less devops/SaaS company/etc. I over-use Apple as an example, but if Apple entered into enterprise data products, Neo would be the kind of thing to be the underlying tech for it, as if you are an apple user, an apple'ey analytics tool would be based on users producing and reasoning about their data with graphs instead of tables, if you could imagine a kind of photoshop for data, or a fundamental conceptual change from spreadsheets to graphs. They aren't as competitive as a data tool, but I think they are unrivaled as a knowledge tool.
The tech is really great, but the product piece appears to have been a challenge because the use cases for graphs have been very enterprise'y, which has limited adoption because people who operate at that higher business logic level of abstraction that graphs enable are not the people picking and adopting new technologies. The growth will come from younger people who learned python in high school, and have a more data centric world view. Maybe that's the play.
Anyway, as a user I can see why they got participation on an F round. Imo, they've solved the what/how/why and have done some amazing science and engineering, and what I hope that money buys them is some magic.
> my impression of dgraph was they had a bunch of unnecessary and poor abstractions in their documentation
I'm surprised to hear that. Dgraph uses GraphQL (and DQL, a fork of GraphQL) as the query language -- which is a lot more widely adopted language than Cypher. Dgraph users really like the simplicity and intuitiveness of the language and ease of use of the DB.
I'm curious what was confusing in documentation.
Literally every dgraph user must necessarily know the answer to that already, or maybe they just mentally black box it and work around it, but at the time, my impression was non-users don't know this, and if I'm adopting a whole new taxonomy I need extra incentives to know it's worth while. It's probably an excellent and even superior technology, but what read as auteurism in the product at the time made me reconsider how much time I wanted to invest before encountering another one.
Anyway, coming from being a Cypher user, the learning curve for the use case of "I want to create nodes of different types with attributes, with relationships of types with attrbites, then CRUD them and verticies with a Flask app" felt a bit steep after that.
SQLite would do the trick, but I wanted consistency from my business logic to a grammar, to a data model. It's very easy to encounter graphs and just think we're not smart enough for them or our problem isn't graphy enough, but given the ease with which I could encode a grammer into cypher, I reluctantly gave up on dgraph. That said, I'm not a gremlin/tinkerpop fan either, as from a top-down user use case, it wasn't satisfying either.
DGraph has a lot of users and customers who love your product and the smartest people I knew recommended it to me, so my issues might not register, but there were a few experiences going through the tutorials that made me wary I was sinking costs into it relative to my use case, e.g. I have 1 week to build a PoC Flask app with a graph on the back end, and then scale it if the customer cares. That's what I used Neo for, and didn't use dgraph for, even though I figured I'd hire developers to rewrite it for dgraph if it got off the ground.
Anyway, long way round, but I'm a long time believer and user of graph techs and want everyone in that market to succeed.
But I do appreciate all the effort Neo4j put for years in educating us all on graph databases, use cases, and just drawing attention and awareness.
Neo4J has been very meh in my experience, but they are the biggest.
That said, the 2nd system never got off the ground, I quit the badly run startup before finishing it. And now that I have a bit more experience with Neo4j, I'd say it would have been a bear to fully implement. Java is too heavy and Neo4j is a memory pig. It works, I can't say it is bad or perhaps iffy like Tinkerpop, but it is "Enterprise Software" and everything that is associated with that meme.
I have been using Tigergraph for my latest research into modeling the schedules etc of rail transport. It is much faster than Neo4j, requires far less memory (I can store every bit of data I need to in it unlike with Neo4 (would need multiple 64GB of RAM servers)), and its programming language is pretty nice once you get the hang of it.
So I'd recommend Tigergraph. The downside is that it is not as 'plug and play' as Neo4j, does not has all the mindshare/fancy bells and whistles, and is entirely C++/Unix based. So having some UNIX sysadmin experience is helpful unless you want to use their cloud solution.
I think there's plenty of room to disagree with this view that modeling graph data in SQL is not "logical enough". Though to be fair, there seems to be some ongoing work on adding some "property"-based layer to bare SQL in order to make common graphs-and-properties oriented queries a bit more ergonomic.
File under, "not sure if a very good joke, or serious".
I'm leaning toward the former. "New tech lead" is the give-away (or is it?).
I'll also say that working on the entire graph if you need to is difficult, they're not oriented around working on the whole more like fragments that you've paired down through your query modifiers so if you know you're going to be doing a lot of work that requires you to do things on the entire graph that may change the performance characteristics for you a lot.
I like it and would use it again but there are rough edges to work around still and it is young so know your use case and know the trade offs you're making.
My concerns basically range around memory consumption, query language and language ecosystem.
Edit: Oh and I guess around like functional extensibility. The last time I used a graph DB I had to export from the db itself to HDFS and use Spark to do things like PageRank and I'd rather be able to write that natively in their query lang or some like UDF equivalent.
This approach of extending PostgreSQL is very appealing to me. There is a great deal of value in the PostgreSQL stack that doesn't need to be reinvented just to deliver a graph database and query language. How much easier is it to adopt graph database techniques when it is simply an extension of database technology nearly everyone is already running? Conceivably one might find some future version of PostgreSQL RDS (and other cloud PostgreSQL services) delivers Cypher.
that being said I thought about porting it to postgresql with apache age vs. using neo4j for a project because it's faster at least for this usecase. Easier said than done, through.
If you want to play with graphs and linked-data it's super cool. There is also structr[2] that builds CMIS / Alfresco ECM like functionality atop neo4j with graaljs scripts.
Seems the concept of having fluid relationships is appealing for querying but not structuring/storing... which seems like a disconnect.
I have only seen a few Neo4J systems in serious production workloads and they were ALL on logistics... I'm not sure that it's being positioned (or interpreted) as a nice simple solution to start out on.
Edit: I just checked out neo4j "bloom", and it's definitely a good way to make graph more accessible - they should continue to build further on it.
It's also torturing the definition of "query language." There is no equivalent of "join", or any other typical query feature such as aggregation, grouping, sorting, filtering. GraphQL has as much to do with graphs or query languages as my smart TV has to do with intelligence. It's RPC, but RPC fell out of fashion when SOAP/WSDL/XML died.
When I started back then in 2016 with it, it was pretty cool how directly graphql mapped to the graph model in the db.
Community Edition is hobbled to the point where I wouldn't recommend anyone run it in production.
Or if you absolutely need on premise and are small there is the startup program for free enterprise licenses (https://neo4j.com/startups/)
Many companies, like the one I'm at, have the opposite use case -- many, geo-distributed, tiny graphs and multiple (read: 3-5) pre-prod environments. They simply don't have a pricing model that supports customers like us.
They wanted to charge us something like 10% of our ARR for something that was just a component of one microservice.
We did evaluate Neo4j, but put down due to its complex query language (cypher) and slowness. It was really an awkward language, super awkward.
We also evaluated arangodb and we found it much better than Neo4j. Performance was good and its query language was better too.
What we realised in the process is, using graph databases is more of a cultural transformation as well. SQL is much well understood, well adopted and well supported by community.
Ultimately we implemented the use case in Postgres, and thank God we did it that way. IMO, we can still get all the benefits of graphs we SQL databases with little efforts.
We had a team member who had used Neo4J professionally for years and could not figure it out. And we only had one; every other teammate and new volunteer had to be trained in a strange new way of thinking about databases and a new query syntax. Setting it up to run locally for development was a difficult process. Progress was slow and our code to access the database was messy. We kept being promised that, in exchange for these heavy burdens, Neo4J would do amazing things for us once we started doing graph queries, but we never got there because it couldn't do the basics.
We rewrote the project to run on PostgreSQL. Five tables, properly indexed, lightning fast, easy to set up and understandable by anyone. A hundred million rows and it didn't break a sweat, on the lowest tier of machine. Even graph queries were straightforward and quick.
Advice: Don't use Neo4J as your primary store, and avoid it altogether if you want volunteer or casual contributors. For us, it was all costs and no benefits.
There's more than abundant amounts of capital in private equity, so the only real reason to go public is to create liquidity for early founders/investors/employees who want to cash out. Given that, arguably you could say going public, instead of raising private capital, is the smell. Or at least an attempt to top-tick the valuation, e.g. WeWork.
It means there's a lot of capital being dumped into trying to find some hidden source of profit and it's getting harder and harder to find it.
It's the capital equivalent of going from finding oil in your back yard to blasting it out of tar sands in the Canadian tundra. Sure the capital/oil keeps flowing, but the inherent unsustainability of the system starts to show its face more clearly.
With relational databases, you can join on anything anywhen, so you can explore new relationships as you go.
Why isn’t this sufficient to explore novel relationships?
I was thinking more about technical reasons in terms of the storage layer. The query syntax seems to be the least interesting part of a database, to be perfectly honest.
Yikes
> largest investment in a private database company
I guess this is one of those PR moves that is trying to make something lame sound good? If your customer portfolio includes Walmart, Volvo and AstraZeneca why are you raising money a 6th time?
I take your point that this is a really late round of funding, but this doesn't mean they've caught on like they want to yet.
By that time, the share is pretty much what it's worth. But 100 times round A. If all went well of course. Over 90% of the time, it didn't go well. Who knows how that will end for Neo4j, but the investors have their eggs in many other baskets anyway.
What matters isn't showing profit anymore, not even significant revenue to justify further funding. All you need is some appealing growth figures, sometimes not even that, just a convincing argument that hyper growth is on the horizon.
At some point millions are put into advertising and a strong sales force to grow revenue many folds. In the enterprise market, the trick often works pretty well.
If you are using relational db you can use recursion to achieve the same effect without having bring n4j and cipher into your stack.
A simple example of implementing a hierarchical graph data structure on postgres and exposing it via graphql can be found on the hasura blog.
https://hasura.io/blog/authorization-rules-for-multi-tenant-...
Has anyone tried that? Would love any notes/pointers!
Join data across the two to get the best of both basically. Hasura doesn't support Neo4j natively yet, but maybe using Neo4j's graphql wrapper as an input to Hasura perhaps?
One thing that is much easier to model and query, or rather more natural and simple, is authorization and other granular questions you might have about how users and data is connected.
A thing that I can’t wrap my head around however is temporal data modeling with graphs. Haven’t seen or thought of anything too satisfying yet, that meshes well with how I think about graphs. Whereas in SQL it is more explored and clear to me.
I agree that their marketing is very aggressive, but this tech has quite some merit.
The capability of sales to sell a product to massive companies for a use case that we're actually not very good at was unbelievable.