Apache AGE, a PostgreSQL extension with graph database functionality (opens in new tab)

(github.com)

140 pointsflymetothemoon3y ago76 comments

76 comments

61 comments · 18 top-level

canadiantim3y ago· 8 in thread

Could it be efficient to use Apache AGE for e.g. retrieving all comments on an article?

Currently I’m using materialized paths to efficiency return all commments but would be keen to know if AGE can help query comments for an article more powerfully.

somebee3y ago

The ltree extension (https://www.postgresql.org/docs/current/ltree.html) is perfect for this usecase.

asah3y ago

How does ltree compare with jsonb?

andrewstuart23y ago

Have you tried recursive CTEs with a simple id, parent_id etc schema? These should perform very well if those columns are in an index.

Afaik this is pretty much the canonical way to store recursive comment trees. Or any kind of DAG.

akshayshah3y ago

As long as comments are a tree, there’s only one path from the root (the post) to an individual comment. How would a recursive CTE perform better than a prefix scan on an indexed string column?

Storing a pointer to each node’s parent or using sorted sets seems like it would make the parent poster’s query slower. Those approaches would make it easier to reparent comments, though, and they’d support arbitrarily deep trees (whereas the materialized path implementations I’ve seen limit path length).

1 more reply

mrslave3y ago

This comment gave me a flashback to Celko's SQL for Smarties. I believe the updated books are split off into a few smaller books? But the section/book on trees in a relational database helped me greatly once in a galaxy far far away.

CptNibblesworth3y ago

AGE can handle that and recursive ctes can as well, but AGE has mechanisms to handle cyclic graphs as well.

ramraj073y ago

Why not a trigger that maintains this in a simpler query in a separate table? Sounds more performant to me!

Recursive CTEs sounds like something you would do if your total comment count in the db is not in the six figures or something. What does HN do?

ramraj073y ago

I once achieved it by setting a parent_id string column - for root comments it’ll just be article id. For replies it’ll be “parent_comment.parent_id || parent_comment.id”. Then it’s just a single list no recursion needed to get the full hierarchy at whatever level. Can also be easily migrated to dynamodb and get infinite scaling and zero downtime costs.

mradek3y ago· 7 in thread

Interesting. What are some good extensions for pg? I have only used UUID and postgis.

hans_castorp3y ago

> I have only used UUID and postgis.

If you used "uuid-ossp" to get uuid_generate_v4(), then this is no longer necessary since Postgres 13 as there is now a built-in gen_random_uuid()

https://www.postgresql.org/docs/current/functions-uuid.html

mradek3y ago

Oh wow TIL thank you

somebee3y ago

The ltree extension is fantastic if you have data like comments or any other hierarchical structure.

rdevsrex3y ago

Well, I wouldn't call it interesting, but I like the citext extension for case insensitive comparison.

jschrf3y ago

TimescaleDB

Dowwie3y ago

aws_s3: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_...

ellisv3y ago

pg_stat_statements

faizshah3y ago· 6 in thread

Anyone tried this? How does the performance compare to neo4j and RedisGraph?

I’m about to give RedisGraph a try and I guess I will try this one a go as well.

ysko753y ago

AGE 1.1 should perform better or at least similar to Neo4j. Not sure about RedisGraph.

tluyben23y ago

Did you try dgraph? For our use cases it won over neo4j. Didn’t try redisgraph.

nde3y ago

We’ve been using Dgraph in production for a few years now.

The project is fine for hobby projects but it is NOT production ready.

Don’t take my word for it, though… I invite you read through some of the issues reported in their discussion forums and to take a look at their Github contributions over the past year.

There was major turmoil in Dgraph Labs (the project’s maintainers) last year which resulted in the CEO and 95% of the engineers exiting the company. They are currently in a rebuilding phase, with limited staff and runway.

There are several critical bugs, which lead to either data loss, data corruption or cluster instability, which the current maintainers have failed to fix. Additionally, their customer support is often either unresponsive or unhelpful (even for paying customers).

Running a Dgraph cluster is expensive, with heavy memory utilization and favoring vertical scaling. If you need scale, then be prepared to spend big.

The documentation is not great and because very few people use this project in production, help is extremely limited.

Best of luck to you should you choose Dgraph and to anyone currently using it already.

AtlasBarfed3y ago

There is also a graph layer for Cassandra. It was based on Titan ... JanusGraph.

I also played around with a graph-document database hybrid when I had downtime, but never got it close to anything usable.

A json document database with relations between documents is basically a property graph. I've seen a lot of the document databases (rethinkdb, orientdb, elasticsearch, etc) that seem close to realizing this too, but no one has run with it.

Most document databases have some sort of nested "walker" api, and if your json doc has properties that are subdocuments, will walk those. That's basically a graph api.

I wrote it as a "streaming api" so a large document/property graph could be serialized out to the client as the lookup engine walked the graph, and you don't need to fully load a complex set of documents in the query layer memory before sending it out to the client.

But I just didn't have the development horsepower to get to the various query and index capabilities. I think the general distributed design was decent and offered hybrid plain-old-table, document, and graph capabilities all in one. And cassandra, PITA that it is, does linearly scale.

1 more reply

jhoechtl3y ago

Didn't dgrapgh dissolve? Based on open functionality discussed on their help forum and stbility issues I wouldn't touch it any more.

A product which lived up for VC money and little more.

1 more reply

tofuahdude3y ago

We benchmarked all 3; redisgraph was fastest by far.

3 more replies

fulafel3y ago· 6 in thread

This is written in C. I wonder how common it is to write PG extensions in safer langugaes and what would be the most suitable.

I'm somewhat wary of using nontrivial C extensions, having seen so many of them sometimes seg fault the backend (eg PostGIS). There seem to be PG backend crashes described in this projects issues as well.

Sasasu3y ago

PG has a special memory manage rule, named MemoryContext. All memory allocated in a context will disappear when it leaves that context. this means that you can safely not free memory, or your memory will be freed in unexpected places. this is a big conflict with the way rust manages memory. write extension in rust won't improve it much.

And in PG, there is a special method to create a process, creating threads is not possible because the logging system makes heavy use of setjmp().

zasdffaa3y ago

> creating threads is not possible because the logging system makes heavy use of setjmp().

Naive question from a non-c user, setjmp/longjmp just manipulate the stack and since each thread has its own execution stack, that should be completely safe ISTM - so why is it unsafe/impossible? I'm missing something.

1 more reply

jpnc3y ago

There's https://github.com/tcdi/pgx for writing extensions in rust.

nine_k3y ago

Postgres itself is written in C. I suppose its every internal interface is in C. I wonder how many unsafe sections would an extension written in Rust have to have to use these interfaces.

I wish something like Lua + LuaJIT could be used to write such extensions; at least it's memory-safe. OTOH mapping these C interfaces to Lua structures, and making them work with GC may happen to be non-trivial.

fulafel3y ago

PG ships with Lua support: https://www.postgresql.org/docs/current/external-pl.html

(Also Python, Javascript, and Java)

I don't know specifics about the API coverage. It seems this extension mostly just implements new SQL visible functions and data types, which should be doable from those languages as well. Composite types might have to be defined as PG records (or json) instead of C level new PG object types.

1 more reply

anecdotal13y ago

Timescale wrote their Promscale extension for Postgres in Rust. So it exists.

metadat3y ago· 4 in thread

What subset of the openCypher dialect specification will Apache Age support?

Every implementation is all over the place and completely non-portable. And neo4j performance leaves much to be desired.

My personal go-to is RedisGraph paired with RedisInsight for the instant visualizations. It just feels "right" and, while not perfect, is overall intuitive.

bavell3y ago

I've been following RedisGraph with great interest since I first heard about it. Fell in love with cypher but not a big fan of neo4j. Most recently I've been playing with EdgeDB (not cypher-based) which scratches my graphDB itch pretty well but it's still a little too early for me to consider adopting it in a serious project.

I'm glad there seems to be continued interest in graphDBs, I think there's a lot of potential in that space and I'm eagerly awaiting a clear winner to emerge.

jhoechtl3y ago

We might see a clear winner in query language, not product. What is the clear winner in relational databases? SQL certainly but what product?

flymetothemoonOP3y ago

The Apache AGE is inspired by AgensGraph, a multimodel database fork of PostgreSQL. So it uses AgensGraph dialect.

bayesian_horse3y ago

AGE is an extension, not a fork. You can have relational tables and graphs in the same database, probably you can also use both in the same query.

1 more reply

bayesian_horse3y ago· 2 in thread

Always bet on PostgreSQL!

I hope AGE matures a bit in the future. There are lots of use cases for Graph Databases. One I'm interested in is bitemporality. It's easy to use ltree or CTE for tree-like structures. But what if you want to move nodes in the graph at certain times? Like a device being scheduled to be in different rooms across time. And also the history of those schedules. In a graph database you can label edges with temporal attributes and then query for a view of the graph at a certain point in time and in a certain history state by filtering the edges.

stevesimmons3y ago

That's a really interesting idea.

Can you recommend any good references for bitemporality in graph dbs?

bayesian_horse3y ago

Theres XTDB

1 more reply

atemerev3y ago· 2 in thread

"Apache AGE is currently being developed for the PostgreSQL 12 release"

Well sorry, we have PostgreSQL 15 already.

feike3y ago

If you keep reading, the words following your quote make it clear that PG15 should be supported:

"and will support PostgreSQL 13 and all the future releases of PostgreSQL."

gutbasokchok3y ago

and I believe the more recent the version is, the less stable it is. Sure there will be additional nice features, but older versions are still in progress. And yeah, if you kept reading, AGE sounds like it will support more versions in the future.

gyre0073y ago· 2 in thread

This is a great project, but last time I checked it was lacking a lot of CYPHER features and wasn’t moving very fast forward. But I’m hoping it will catch up to the point it will become useful.

robertlagrant3y ago

Cypher as in the Neo4J query language?

bayesian_horse3y ago

There is now an OpenCypher specification and AGE seems to strive towards supporting most of it.

sandGorgon3y ago· 2 in thread

has anyone here worked on graph neural networks ? basically creating embeddings for node based on their edge connectivity (or reachability) and using that for neural networks ?

how do you do this at scale ?its generally a NP hard problem, but wondering whether something like AGE helps.

not sure how Google, etc or even someone on fraud detection does this at scale

ArnoVW3y ago

You subsample. One package I used made N 'random walks' for each node. The random walks are written out as 'sentences', where the node id's are words.

That results in a huge text file, that you then embed as if it were a normal text. The result is a normal 'word embedding' where the words are in reality the node id's. Works like a charm. Highly scalable.

https://github.com/dwslab/jRDF2Vec

sandGorgon3y ago

really ? so u keep subsampling as the data becomes larger and larger.

instead of ...well...throwing more hardware that seems to be easier and easier these days.

P.S. not trolling. im genuinely wondering if there is a better way to split the problem heuristically

1 more reply

gorlomi3y ago· 2 in thread

How are graph edges and nodes exposed to the Postgresql type system?

flymetothemoonOP3y ago

There is a documentation for AGE that has some basic examples on how cypher and sql can be mixed. Regarding nodes and edges: they are sub-types to a type called Agtype. More details are in the documentation.

https://age.apache.org/age-manual/master/index.html

CptNibblesworth3y ago

AGE uses Agtype l, which is a superset of JsonB for its uses, the primary addition is the edge and vertex type.

Kalanos3y ago· 1 in thread

Does this construct edge tables (many-to-many) for every relationship behind the scenes? if so, can attributes be added to the edges?

robertlagrant3y ago

I'd hope so! You can do that in relational DBs.

Dowwie3y ago· 1 in thread

I create a DAG using recursive sql. I assume that saving data in a graph and querying the graph with a native graph language would be faster. Has anyone benchmarked performance differences between the two?

bayesian_horse3y ago

Depends on the use case. Recursive SQL can be good enough, maybe even faster, for certain use cases. The problem isn't so much the query language but the indexing. Graph engines index the nodes and edges in a particular way so that traversal is fast.

Most examples of Recursive SQL I've seen will only involve nodes on exactly one Table an with exactly one kind of a relationship/edge (for example a tree with "parent" edges). Graph DBs allow you to relate multiple different types of nodes using multiple kinds of edges. The edges can have queryable attributes like an intermediary table in a many-to-many relationship. And all of that is still indexed efficiently.

dang3y ago

Apache Age: A Graph Extension for PostgreSQL - https://news.ycombinator.com/item?id=26345755 - March 2021 (45 comments)

Apache AGE: PostgreSQL-based graph database - https://news.ycombinator.com/item?id=26309560 - March 2021 (11 comments)

twaway233y ago

I'm considering using a graph database for a SaaS product. If I used Apache AGE, I would probably have a "graph" for each customer to partition the data. Are there any downsides or limitations to having thousands of separate graphs?

From the documentation it seems that each graph will use a separate "namespace" in Postgres. Are there any performance costs of switching namespaces for each query?

Or do you recommend that we use a single graph with a label per customer? This option seems like it could open up some security issues if some queries forget to add this label. By using a separate graph per customer, the query will need to have a valid graph name for a customer to return any data. If it is filtered by a label, you can easily forget to add it and think everything is OK because it actually returns results.

mark_l_watson3y ago

I might try running this with Docker just to try it out, but probably this is the type of project to watch and wait for maturity.

I am a big fan of graph databases. Professionally I have used RDF data stores with SPARQL queries and Google’s Knowledge Graph with a pattern matching query mode. I play around with Neo4J, but no one has paid me to use it yet.

I think it very likely that in a year or two AGE will get better Cypher query language support and other changes, and should be a wonderful platform for combining relational and graph data stores.

ysko753y ago

Apache AGE Discord

https://discord.com/invite/NMsBs9X8Ss

canadiantim3y ago

I’ve been waiting to see Apache AGE on HN. Looks amazing, thanks for all the great work!

enugu3y ago

Wish Postgres had an optional data type (tagged union), Does someone know an extension which implements that?

j / k navigate · click thread line to collapse

76 comments

61 comments · 18 top-level

canadiantim3y ago· 8 in thread

Could it be efficient to use Apache AGE for e.g. retrieving all comments on an article?

Currently I’m using materialized paths to efficiency return all commments but would be keen to know if AGE can help query comments for an article more powerfully.

somebee3y ago

The ltree extension (https://www.postgresql.org/docs/current/ltree.html) is perfect for this usecase.

asah3y ago

How does ltree compare with jsonb?

andrewstuart23y ago

Have you tried recursive CTEs with a simple id, parent_id etc schema? These should perform very well if those columns are in an index.

Afaik this is pretty much the canonical way to store recursive comment trees. Or any kind of DAG.

akshayshah3y ago

As long as comments are a tree, there’s only one path from the root (the post) to an individual comment. How would a recursive CTE perform better than a prefix scan on an indexed string column?

1 more reply

mrslave3y ago

CptNibblesworth3y ago

AGE can handle that and recursive ctes can as well, but AGE has mechanisms to handle cyclic graphs as well.

ramraj073y ago

Why not a trigger that maintains this in a simpler query in a separate table? Sounds more performant to me!

Recursive CTEs sounds like something you would do if your total comment count in the db is not in the six figures or something. What does HN do?

ramraj073y ago

mradek3y ago· 7 in thread

Interesting. What are some good extensions for pg? I have only used UUID and postgis.

hans_castorp3y ago

> I have only used UUID and postgis.

If you used "uuid-ossp" to get uuid_generate_v4(), then this is no longer necessary since Postgres 13 as there is now a built-in gen_random_uuid()

https://www.postgresql.org/docs/current/functions-uuid.html

mradek3y ago

Oh wow TIL thank you

somebee3y ago

The ltree extension is fantastic if you have data like comments or any other hierarchical structure.

rdevsrex3y ago

Well, I wouldn't call it interesting, but I like the citext extension for case insensitive comparison.

jschrf3y ago

TimescaleDB

Dowwie3y ago

aws_s3: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_...

ellisv3y ago

pg_stat_statements

faizshah3y ago· 6 in thread

Anyone tried this? How does the performance compare to neo4j and RedisGraph?

I’m about to give RedisGraph a try and I guess I will try this one a go as well.

ysko753y ago

AGE 1.1 should perform better or at least similar to Neo4j. Not sure about RedisGraph.

tluyben23y ago

Did you try dgraph? For our use cases it won over neo4j. Didn’t try redisgraph.

nde3y ago

We’ve been using Dgraph in production for a few years now.

The project is fine for hobby projects but it is NOT production ready.

Don’t take my word for it, though… I invite you read through some of the issues reported in their discussion forums and to take a look at their Github contributions over the past year.

Running a Dgraph cluster is expensive, with heavy memory utilization and favoring vertical scaling. If you need scale, then be prepared to spend big.

The documentation is not great and because very few people use this project in production, help is extremely limited.

Best of luck to you should you choose Dgraph and to anyone currently using it already.

AtlasBarfed3y ago

There is also a graph layer for Cassandra. It was based on Titan ... JanusGraph.

I also played around with a graph-document database hybrid when I had downtime, but never got it close to anything usable.

Most document databases have some sort of nested "walker" api, and if your json doc has properties that are subdocuments, will walk those. That's basically a graph api.

1 more reply

jhoechtl3y ago

Didn't dgrapgh dissolve? Based on open functionality discussed on their help forum and stbility issues I wouldn't touch it any more.

A product which lived up for VC money and little more.

1 more reply

tofuahdude3y ago

We benchmarked all 3; redisgraph was fastest by far.

3 more replies

fulafel3y ago· 6 in thread

This is written in C. I wonder how common it is to write PG extensions in safer langugaes and what would be the most suitable.

Sasasu3y ago

And in PG, there is a special method to create a process, creating threads is not possible because the logging system makes heavy use of setjmp().

zasdffaa3y ago

> creating threads is not possible because the logging system makes heavy use of setjmp().

1 more reply

jpnc3y ago

There's https://github.com/tcdi/pgx for writing extensions in rust.

nine_k3y ago

Postgres itself is written in C. I suppose its every internal interface is in C. I wonder how many unsafe sections would an extension written in Rust have to have to use these interfaces.

fulafel3y ago

PG ships with Lua support: https://www.postgresql.org/docs/current/external-pl.html

(Also Python, Javascript, and Java)

1 more reply

anecdotal13y ago

Timescale wrote their Promscale extension for Postgres in Rust. So it exists.

metadat3y ago· 4 in thread

What subset of the openCypher dialect specification will Apache Age support?

Every implementation is all over the place and completely non-portable. And neo4j performance leaves much to be desired.

My personal go-to is RedisGraph paired with RedisInsight for the instant visualizations. It just feels "right" and, while not perfect, is overall intuitive.

bavell3y ago

I'm glad there seems to be continued interest in graphDBs, I think there's a lot of potential in that space and I'm eagerly awaiting a clear winner to emerge.

jhoechtl3y ago

We might see a clear winner in query language, not product. What is the clear winner in relational databases? SQL certainly but what product?

flymetothemoonOP3y ago

The Apache AGE is inspired by AgensGraph, a multimodel database fork of PostgreSQL. So it uses AgensGraph dialect.

bayesian_horse3y ago

AGE is an extension, not a fork. You can have relational tables and graphs in the same database, probably you can also use both in the same query.

1 more reply

bayesian_horse3y ago· 2 in thread

Always bet on PostgreSQL!

stevesimmons3y ago

That's a really interesting idea.

Can you recommend any good references for bitemporality in graph dbs?

bayesian_horse3y ago

Theres XTDB

1 more reply

atemerev3y ago· 2 in thread

"Apache AGE is currently being developed for the PostgreSQL 12 release"

Well sorry, we have PostgreSQL 15 already.

feike3y ago

If you keep reading, the words following your quote make it clear that PG15 should be supported:

"and will support PostgreSQL 13 and all the future releases of PostgreSQL."

gutbasokchok3y ago

gyre0073y ago· 2 in thread

This is a great project, but last time I checked it was lacking a lot of CYPHER features and wasn’t moving very fast forward. But I’m hoping it will catch up to the point it will become useful.

robertlagrant3y ago

Cypher as in the Neo4J query language?

bayesian_horse3y ago

There is now an OpenCypher specification and AGE seems to strive towards supporting most of it.

sandGorgon3y ago· 2 in thread

has anyone here worked on graph neural networks ? basically creating embeddings for node based on their edge connectivity (or reachability) and using that for neural networks ?

how do you do this at scale ?its generally a NP hard problem, but wondering whether something like AGE helps.

not sure how Google, etc or even someone on fraud detection does this at scale

ArnoVW3y ago

You subsample. One package I used made N 'random walks' for each node. The random walks are written out as 'sentences', where the node id's are words.

https://github.com/dwslab/jRDF2Vec

sandGorgon3y ago

really ? so u keep subsampling as the data becomes larger and larger.

instead of ...well...throwing more hardware that seems to be easier and easier these days.

P.S. not trolling. im genuinely wondering if there is a better way to split the problem heuristically

1 more reply

gorlomi3y ago· 2 in thread

How are graph edges and nodes exposed to the Postgresql type system?

flymetothemoonOP3y ago

https://age.apache.org/age-manual/master/index.html

CptNibblesworth3y ago

AGE uses Agtype l, which is a superset of JsonB for its uses, the primary addition is the edge and vertex type.

Kalanos3y ago· 1 in thread

Does this construct edge tables (many-to-many) for every relationship behind the scenes? if so, can attributes be added to the edges?

robertlagrant3y ago

I'd hope so! You can do that in relational DBs.

Dowwie3y ago· 1 in thread

bayesian_horse3y ago

dang3y ago

Apache Age: A Graph Extension for PostgreSQL - https://news.ycombinator.com/item?id=26345755 - March 2021 (45 comments)

Apache AGE: PostgreSQL-based graph database - https://news.ycombinator.com/item?id=26309560 - March 2021 (11 comments)

twaway233y ago

From the documentation it seems that each graph will use a separate "namespace" in Postgres. Are there any performance costs of switching namespaces for each query?

mark_l_watson3y ago

I might try running this with Docker just to try it out, but probably this is the type of project to watch and wait for maturity.

I think it very likely that in a year or two AGE will get better Cypher query language support and other changes, and should be a wonderful platform for combining relational and graph data stores.

ysko753y ago

Apache AGE Discord

https://discord.com/invite/NMsBs9X8Ss

canadiantim3y ago

I’ve been waiting to see Apache AGE on HN. Looks amazing, thanks for all the great work!

enugu3y ago

Wish Postgres had an optional data type (tagged union), Does someone know an extension which implements that?

j / k navigate · click thread line to collapse