* Flexible knowledge association i.e. Knowledge Graphing
* Modeling and querying associations / models with many-steps-removed requirements
* Expert Systems / Inference Engines
* Lazy traversal for complex job scheduling
Graph DBs are not good at being a general purpose 95% of use cases database. Just use Postgres/MySQL if you're not sure. We use Neptune (AWS managed GraphDB) to model cybersecurity dependencies between many companies and report on supply chain vulnerabilities many steps removed. Those kinds of queries are non-trivial and expensive on anything but a Graph Database.
As GraphDBs meet niche query requirements you usually have other databases involved in the full application. If you want to tractably manage many databases in a system you ideally want to be in streaming / event sourced semantics. If you're already in an imperative crud-around-data / batch pipeline you'll find greater maintenance costs in adopting a GraphDB or any additional DB for that matter.
The hard part is that ES/Streaming systems work best / almost necessitate a clean and clear domain model. A clean and clear domain model requires a lot of discussion and consensus with domain experts and product owners. Buy in to have the kind of discussions needed is the source of the issues I've experienced with these kind of systems. CRUD can paint over a lot of cloudy abstract concepts for better or for worse. These kind of discussions are energy intensive and mentally painful to cast light on the cloudy thoughts.
There's not great streaming/es support on a language level outside of the robust actor model systems (e.g. Erlang/Elixir). There are systems like Akka that simulate that to some extent on runtimes like JVM, but a cooperative scheduler and an actor model don't mix great. For non-actor model aspects I've been seeing more service level dataflow systems like KSQL / MaterializeDB gain traction, but are nevertheless a solution for read-models not application logic.
I'd recommend trying non relational databases even if you're not sure.
Which I suppose kind of typifies the problem. Graph databases are fantastic because they let you flexibly and coherently model practically anything. But, perhaps principally because of this, they can become an impediment once you better understand the nuances and idiosyncrasies of your domain, and thus need something that has more optimal (or perhaps predictable) performance for the kinds of questions you know you need to ask over a representation of your domain/data that you know is sufficient?
I do think there's a valid question of how useful n-hop queries are for an N that is greater than 2 or 3.
It's the most general purpose means I can see to model entities. I can't see many invalid uses.
Maintenance re-started in 2017, with IBM & Google stepping up to back it[2].
[1] https://github.com/JanusGraph/janusgraph/milestones?state=cl...
[2] https://architecht.io/google-ibm-back-new-open-source-graph-...
I tried Gremlin but it feels like an imperative DSL. Cypher queries are more readable, but are limited to Neo4J. I am looking forward for Open Cypher or maybe a variation of Facebook GraphQL.
(hundreds of thousands of folders)
The idea would be to use a graph DB for a first query to get the file ids in scope (all files inside a given folder and its subfolders) before running the actual SQL query, eg creation_date < foo AND file_id in [array from graph db output]