I hope AGE matures a bit in the future. There are lots of use cases for Graph Databases. One I'm interested in is bitemporality. It's easy to use ltree or CTE for tree-like structures. But what if you want to move nodes in the graph at certain times? Like a device being scheduled to be in different rooms across time. And also the history of those schedules. In a graph database you can label edges with temporal attributes and then query for a view of the graph at a certain point in time and in a certain history state by filtering the edges.
Can you recommend any good references for bitemporality in graph dbs?
Every implementation is all over the place and completely non-portable. And neo4j performance leaves much to be desired.
My personal go-to is RedisGraph paired with RedisInsight for the instant visualizations. It just feels "right" and, while not perfect, is overall intuitive.
I'm glad there seems to be continued interest in graphDBs, I think there's a lot of potential in that space and I'm eagerly awaiting a clear winner to emerge.
Apache Age: A Graph Extension for PostgreSQL - https://news.ycombinator.com/item?id=26345755 - March 2021 (45 comments)
Apache AGE: PostgreSQL-based graph database - https://news.ycombinator.com/item?id=26309560 - March 2021 (11 comments)
I’m about to give RedisGraph a try and I guess I will try this one a go as well.
The project is fine for hobby projects but it is NOT production ready.
Don’t take my word for it, though… I invite you read through some of the issues reported in their discussion forums and to take a look at their Github contributions over the past year.
There was major turmoil in Dgraph Labs (the project’s maintainers) last year which resulted in the CEO and 95% of the engineers exiting the company. They are currently in a rebuilding phase, with limited staff and runway.
There are several critical bugs, which lead to either data loss, data corruption or cluster instability, which the current maintainers have failed to fix. Additionally, their customer support is often either unresponsive or unhelpful (even for paying customers).
Running a Dgraph cluster is expensive, with heavy memory utilization and favoring vertical scaling. If you need scale, then be prepared to spend big.
The documentation is not great and because very few people use this project in production, help is extremely limited.
Best of luck to you should you choose Dgraph and to anyone currently using it already.
I also played around with a graph-document database hybrid when I had downtime, but never got it close to anything usable.
A json document database with relations between documents is basically a property graph. I've seen a lot of the document databases (rethinkdb, orientdb, elasticsearch, etc) that seem close to realizing this too, but no one has run with it.
Most document databases have some sort of nested "walker" api, and if your json doc has properties that are subdocuments, will walk those. That's basically a graph api.
I wrote it as a "streaming api" so a large document/property graph could be serialized out to the client as the lookup engine walked the graph, and you don't need to fully load a complex set of documents in the query layer memory before sending it out to the client.
But I just didn't have the development horsepower to get to the various query and index capabilities. I think the general distributed design was decent and offered hybrid plain-old-table, document, and graph capabilities all in one. And cassandra, PITA that it is, does linearly scale.
A product which lived up for VC money and little more.
I'm somewhat wary of using nontrivial C extensions, having seen so many of them sometimes seg fault the backend (eg PostGIS). There seem to be PG backend crashes described in this projects issues as well.
And in PG, there is a special method to create a process, creating threads is not possible because the logging system makes heavy use of setjmp().
Naive question from a non-c user, setjmp/longjmp just manipulate the stack and since each thread has its own execution stack, that should be completely safe ISTM - so why is it unsafe/impossible? I'm missing something.
I wish something like Lua + LuaJIT could be used to write such extensions; at least it's memory-safe. OTOH mapping these C interfaces to Lua structures, and making them work with GC may happen to be non-trivial.
(Also Python, Javascript, and Java)
I don't know specifics about the API coverage. It seems this extension mostly just implements new SQL visible functions and data types, which should be doable from those languages as well. Composite types might have to be defined as PG records (or json) instead of C level new PG object types.
From the documentation it seems that each graph will use a separate "namespace" in Postgres. Are there any performance costs of switching namespaces for each query?
Or do you recommend that we use a single graph with a label per customer? This option seems like it could open up some security issues if some queries forget to add this label. By using a separate graph per customer, the query will need to have a valid graph name for a customer to return any data. If it is filtered by a label, you can easily forget to add it and think everything is OK because it actually returns results.
I am a big fan of graph databases. Professionally I have used RDF data stores with SPARQL queries and Google’s Knowledge Graph with a pattern matching query mode. I play around with Neo4J, but no one has paid me to use it yet.
I think it very likely that in a year or two AGE will get better Cypher query language support and other changes, and should be a wonderful platform for combining relational and graph data stores.
Well sorry, we have PostgreSQL 15 already.
"and will support PostgreSQL 13 and all the future releases of PostgreSQL."
Currently I’m using materialized paths to efficiency return all commments but would be keen to know if AGE can help query comments for an article more powerfully.
Afaik this is pretty much the canonical way to store recursive comment trees. Or any kind of DAG.
Storing a pointer to each node’s parent or using sorted sets seems like it would make the parent poster’s query slower. Those approaches would make it easier to reparent comments, though, and they’d support arbitrarily deep trees (whereas the materialized path implementations I’ve seen limit path length).
Recursive CTEs sounds like something you would do if your total comment count in the db is not in the six figures or something. What does HN do?
If you used "uuid-ossp" to get uuid_generate_v4(), then this is no longer necessary since Postgres 13 as there is now a built-in gen_random_uuid()
how do you do this at scale ?its generally a NP hard problem, but wondering whether something like AGE helps.
not sure how Google, etc or even someone on fraud detection does this at scale
That results in a huge text file, that you then embed as if it were a normal text. The result is a normal 'word embedding' where the words are in reality the node id's. Works like a charm. Highly scalable.
instead of ...well...throwing more hardware that seems to be easier and easier these days.
P.S. not trolling. im genuinely wondering if there is a better way to split the problem heuristically
Most examples of Recursive SQL I've seen will only involve nodes on exactly one Table an with exactly one kind of a relationship/edge (for example a tree with "parent" edges). Graph DBs allow you to relate multiple different types of nodes using multiple kinds of edges. The edges can have queryable attributes like an intermediary table in a many-to-many relationship. And all of that is still indexed efficiently.