The negative reviews in GlassDoor are piling up, and seem to share common themes - and it is not pretty.
There are accusations that management tried to lie to early investors about revenue sources.
Does anyone have any idea or insight into what is going on with them?
As long as most graph databases are just a layer of graph syntactic sugar sprinkled on top of a conventional database architecture, they won’t scale.
https://www.thatdot.com/blog/scaling-quine-streaming-graph-t...
Full disclosure: I work on this project.
I left the project. But, as far as I know, there are still planes maintained and authorized to fly in the sky, so I suppose the DB is still up in production.
I don't think a single product is going to satisfy everyone and that's a problem with the category. If by "graph database" you mean you want to do completely random access workloads you are doomed to bad performance.
There was a time when I was regularly handling around 2³² triples with Openlink Virtuoso in cloud environments in an unchanging knowledge base. I was building that knowledge base with specialized tools that involved
* map/reduce processing
* lots of data compression
* specialized in-memory computation steps
* approximation algorithms that dramatically speed things up (there was a calculation that would have taken a century to do exactly that we could get very close to in 20 minutes)
Another product I've had a huge amount of fun with is Arangodb, particularly I have used it for applications work on my own account. When I flip a Sengled switch in my house and it lights up a hue lamp, arangodb is a part of it. I am working on a smart RSS reader right which puts a real UI in front of something likehttps://ontology2.com/essays/ClassifyingHackerNewsArticles/
and using Arangodb for that. I haven't built apps for customers with it but I did do some research projects where we used it to work with big biomedical ontologies like MeSH and it held up pretty well.
I came to the conclusion that it wasn't scalable to throw everything into one big graph, particularly if you were interested in inference and went through many variations of what to do about it. One concept was a "graph database construction set" that would help build multi-paradigm data pipelines like the ones described above. One thing I got pretty sure about was that it didn't make sense to throw everything into one big graph, particularly if you wanted to do inference, so I got interested in systems that work with lots of little graphs.
I got serious and paired up with a "non-technical cofounder" and we tried to pitch something that works like one of those "boxes-and-lines" data analysis tools like Alteryx. Tools like that ordinarily pass relational rows along the lines but that makes the data pipelines a bear to maintain because people have to set up joins such that what seems like a local operation that could be done in one part of the pipeline requires you to scatter boxes and lines all across a big computations.
I built a prototype that used small RDF graphs like little JSON documents and defined a lot of the algebra over those graphs and used stream processing methods to do batch jobs. It wasn't super high performance but coding for it was straightforward and it was reliable and always got the right answers.
I had a falling out with my co-founder but we talked to a lot of people and found that database and processing pipeline people were skeptical about what we were doing in two ways, one was that the industry was giving up even on row-oriented processing and moving towards column-oriented processing and people in the know didn't want to fund anything different. (Learned a lot about that, I sometimes drive people crazy with, "you could reorganize that calculation and speed it up more than 10x" and they are like "no way", ...) Also I found out that database people really don't like the idea of unioning a large number of systems with separate indexes, they kinda tune out and don't listen until you the conversation moves on.
(There is a "disruptive technology" situation in that vendors think their customers demand the utmost performance possible but I think there are people out there who would be more productive with a slower product that is easier and more flexible to code for.)
I reached the end of my rope and got back to working ordinary jobs. I wound up working at a place which was working on something that was similar to what I had worked on but I spent most of my time on a machine learning training system that sat alongside the "stream processing engine". I think I was the only person other than the CEO and CTO who claimed to understand the vision of the company in all-hands meetings. We did a pivot and they put me on the stream processing engine and I found out that they didn't know what algebra it worked on and that it didn't get the right answers all the time.
Back in those days I got on a standards committee involved w/ the semantics of financial messaging and I have been working on that for years. Over time I've gotten close to a complete theory for how to turn messages (say XML Schema, JSON, ...) and other data structures into RDF structures and after I'd given up I met somebody who actually knows how to do interesting things with OWL, I got schooled pretty intensively, and now we are thinking about how to model messages as messages (e.g. "this is an element, that is an attribute, these are in this exact order...") and how to model the content of messages ("this is a price, that is a security") and I'm expecting to open source some of this in the next few months.
These days I am thinking about what a useful OWL-like product would look like with the advantage that after my time in the wilderness I understand the problem.
“I didn’t know” isn’t an acceptable excuse. When faced with unknowns, it’s your job to anticipate, mitigate, uncover problems, etc.
No one can know everything. That’s a fact. But these issues with Neo4j aren’t exactly hard to find. There are loads of folks who have talked about their negative experiences with it. Setting up a proof of concept would confirm them.
Just like a document database is not a good fit for a data model with inherent relations between data entities (simple or complex; the reverse is also true), a graph database is not a butt plug for every butt. Every problem requires an appropriately fitting solution for it.
I think this is the crux of the problem. I once worked on MegaBank's peer to peer payment app, where somebody had figured that the people sending money to each other was a directed graph, so they should use a graph DB to store it. And when Azure's sales team convinced them that CosmosDB could handle relational data and graphs and documents, they bought it hook, line and sinker.
Needless to say, this was a terrible idea: an RDBMS could have handled it just fine, and because everything else was stored in an RDBMS (which despite the marketing fluff is quite different internally in CosmosDB), now doing any kind of join was a huge pain in the ass. As a cherry on top, they were now locked into CosmosDB, which has completely incomprehensible ("request units per second") but very, very high pricing particularly for graphs. Whee!
I can assure you, nobody used any Graph database to achieve any of it.
You can do all of this in the relational model, with the new support for recursive CTE's that now enables arbitrary queries to be performed. Even seamless inference of "additional" data points (often given as a unique selling point of "semantic" solutions!) is just a view, plus indexes on the underlying query if you want it to be fast.
If it wouldn't be narrow neo4j wouldn't need to lay off stuff.
Your examples do not refute this
So what would likely be nice is a better query language that can compile to sql.
Easy to operate, scale and run. We started in 2014 and in 2018 did a large scale enterprise rollout with a large customer. The performance test we put it through loaded millions of nodes and millions more edges with non trivial data and scaled to 800 concurrent users (could have been even more but for the fact that the web servers we had for this test scenario started to max out since the system was scaled for 200 concurrent and we were basically stress testing it at this point).
In the early days, there were a few edge cases of query incompatibility between versions that we caught with unit tests, but otherwise very stable, easy to operate, and easy to use. Cypher is one of my favorite query languages.
Very surprised that people had issues with it.
What is happening to Neo4j is not directly related to Memgraph. Neo4j raised a lot of cash and their investors have a lot of expectations now, this puts their sales under a lot of pressure and has pushed them to raise prices.
On the other hand, Memgraph is cheap and aims at being compatible with Neo4j from an API point of view (even though their don't share any tech background : Neo4js is Java, Memgraph is C++).
Memgraph can be a good replacement for Neo4j, but is not yet popular enough to be a menace for Neo4j in the short term.
Did Neo4j remove the enterprise edition source code? Maybe Memgraph was the inspiration for that too hmm
They first tried adding the commons clause, then when that did not accomplish what they wanted, they closed the enterprise source code.
The kicker is that this was not because of behemoth like Amazon or Oracle, it was because of a single individual… (guess who?)
Good read to get a better picture:
https://sfconservancy.org/blog/2022/mar/30/neo4j-v-purethink...
What would be a good alternative for it?
Or checkout tinkerpop and the databases that support it.
The cost of Neo4j also went up with their new model. (see https://neo4j.com/blog/open-core-licensing-model-neo4j-enter...)
And they did the thing with the closing their source which nasty.
Then there's the separation of OnGDB which we looked at, but that didn't go well either. One day they deleted all of their packages. All gone. Thank God we had caches, but it took them a while to come back online. In hindsight because Neo4j had sued them. I understand that but that caused a LOT of headaches.
I feel that Graph databases are one of those things like Document databases. You probably don't need it...
I got a really good chuckle out of that.
The company was totally infleible with their very outdated licensing model and it constantly lost them potential customers.
We landed on running RedisGraph atop Redis, and got it up and running in 45 minutes. Zero downtime. Zero complaints. Awesome.
I have enjoyed using the free version of Neo4J on my laptop but never at scale.
At the same time they seemed to have put out quite a bit of marketing to developers, but hard to see their pitching solutions for the "enterprise" problems. Comparing this to the RDF Graph players, whom seem more focussed on playing well with all the other parts of the existing infrastructure. e.g. Virtual graphs on SQL dbs etc. (Personal bias to RDF so take that into account).
In the end we will see if the 500$ million investment in market share will materialize as long term sound investment.
The way they build and pitch their product is straight out of the 1990s Oracle playbook.
As soon as end users learned they were getting the same software either way, guess what they picked?
(This is my blog post - iGov Inc is just me)
https://blog.igovsol.com/2018/01/10/Neo4j-Commercial-Prices....
Base R is quite slow. R + data.table is faster than Python + Pandas in a benchmark that I did recently.
For a 1 million row CSV file, Read + Sort + self-Join + Write took on a Windows box:
Base R: 47.56s
Python + Pandas: 6.44s
R + data.table: 2.99s
More details at:
https://www.easydatatransform.com/data_wrangling_etl_tools.h...
In general, a healthy emerging technology workforce should likely have ~ 20% turnover annually to stay fresh and modern. That means an average outside knowledge age of five years, which is quite long.
Some percentage of that 20% should be voluntary. If everyone stays 5 years before moving on, that's 20% turnover, with people leaving in 2 years balanced by people staying eight years, a long time in software years. Some percentage really should be so-called desired attrition, helping people find a better place.
It's unlikely all hires are great fits -- impressive if only 1 in 10 would be a better fit somewhere else -- so unlikely that 10% is as indicative of problems as you worry. For reasons, most firms are incapable of grappling with that day to day, so it takes adverse externalities to push them to encourage fit and upskilling mobility that should be normal.
If a firm can learn to help people find better fits and bring in current outside skills as a regular everyday part of business (rather than once a year layoffs), the firm will be much healthier.
// Finally, consider Postgres. ;-)