What it means to be a Postgres extension (opens in new tab)

(citusdata.com)

144 pointsakane8y ago33 comments

33 comments

31 comments · 5 top-level

siscia8y ago· 16 in thread

Since we are talking about postgres, please let me go a little OT

Is there any interests in a RDS as a Service? So basically setting up and running a completely fault aware postgres cluster in any infrastructure, either public or private?

techdragon8y ago

Oh dear sweet lords in the many heavens yes. I have literally turned down devops transition and general system administration work before because they refused to use RDS or equivalent service like Compose.io due to internal corporate policy. Life is too short for me to ever babysit another database server through the painful process of a carefully orchestrated rolling capacity upgrade by way of deliberate use of replicas and failover. (Apparently this has gotten better in newer Postgres versions, but it's not been my job to keep on top of the implications of these things for several years now)

siscia8y ago

Pretty much the same pain point that I experience.

Thanks for your feedback :)

jacques_chester8y ago

Pivotal & IBM maintain a BOSH release of PostgreSQL for this purpose: https://github.com/cloudfoundry/postgres-release

Since it uses BOSH, you can deploy to a wide range of targets. OpenStack, vSphere, AWS, Azure, GCP and I forget what else.

Disclosure: I work for Pivotal.

mosselman8y ago

A lot of interest. Setting this up is too much sysops for someone who wants to spend his time developing code.

siscia8y ago

Thanks for your feedback :)

manigandham8y ago

What seems to be missing in the market is the ability to only pay for the setup and maintenance operations, but use my own existing cloud account and resources instead of running multi-tenant or in some other company's cloud account.

Basically as if I hired a contractor to install, monitor and upgrade, but automated. Existing services charge too much since they resell VMs and storage, while also being less flexible with access and performance.

There's also the rise of Kubernetes (with operators, helm charts and persistent storage) that takes away much of the complexity. By version 2.0, it should be able to easily make any legacy single-node system into a fault-tolerant service.

siscia8y ago

Exactly!

Ideally, I would just need an SSH key inside your machines and the capabilities to open an ssh tunnel inside the firewall to scrape metrics.

Ideally, the metric should get exposed back to the customer.

I am not a big fan of containers when working with data that are irreplaceable. But the use k8s may really help.

1 more reply

brightball8y ago

Yes, but the trick with it is pricing it so that I have a reason to use it instead of just using RDS.

If I could get something like that on Digital Ocean I’d be all over it.

manigandham8y ago

You already have options:

https://aiven.io

https://www.databaselabs.io/

siscia8y ago

Well, Aiven seems to support DO: https://aiven.io/postgresql

no1youknowz8y ago

Out of interest, could you support MySQL? Or more specifically MemSQL?

manigandham8y ago

MySQL and MemSQL are very different, are you looking for a data warehouse specifically?

They are closed-source and require enterprise licensing based on RAM quota so it's not simple to do automated cloud provisioning. They do have their own MemSQL cloud offering so you might inquire into that. Also MemSQL Ops is probably the easiest and most reliable operations software for any database, it just takes a few clicks to install and upgrade your cluster.

siscia8y ago

This is really just an idea to solve a pain point of mine that I guess is shared among us...

By the way I don't have any experience with MemSQL

jarym8y ago

I imagine there is interest in it since Aiven provide such a thing.

siscia8y ago

Aiven works only with public clouds...

te_chris8y ago

Yep!

simooooo8y ago· 4 in thread

They were surprised that maintaining a postgres fork was a pain in the ass?

stubish8y ago

I think it was more that forks just get ignored. Customers want to run PostgreSQL, not something almost but not quite like PostgreSQL maintained by a company that may or may not employ PostgreSQL core contributors. And we certainly don't want to talk to your sales team about why it is a better fit for our use case.

The stuff Citus has been landing in PostgreSQL is fantastic.

jacques_chester8y ago

> I think it was more that forks just get ignored

They call out a number of forks that do quite nicely. I work for Pivotal, which sponsors Greenplum. Companies pay handsomely for the capabilities it brings to the table.

But they are right that rebasing is a nightmare. My understanding (possibly wrong) is that the broad selection of APIs that make an extension-only approach did not appear in PostgreSQL until more recent versions -- anyone who forked earlier (such as Greenplum) have to first catch up and then migrate.

I do know that the Greenplum team have decided to catch up until they are working against mainline. It is, as you might imagine, a slow process: rebasing millions of lines of code a release at a time is not the easiest task on earth. But maintaining a fork will, in the long run, be harder.

vog8y ago

I also don't get it.

Ever for smaller changes, it is very well known that you better contribute back to the Free Software project - because having your patch included and being maintained there is a lot less trouble than maintaining your fork and updating your patches every few months.

One exception might be one-off changes, but we all know that nothing is more definitive than the temporary.

The only real exception is when your patch is superseded by another patch (or a better solution). Then, you maintian your private patch only until next version is finalized.

stubish8y ago

It took a lot of work to not fork, and only recently became practical. Features like logical replication, DDL triggers, foreign data wrappers are all useful for this sort of thing and are all new. Companies like Citus and 2nd Quadrant first needed to get the infrastructure in place, so kudos to them and the PostgreSQL core team.

mosselman8y ago· 3 in thread

Can anyone shed some light on how the supposed new cluster features of Postgres 10 compare to something like Citus?

In other words. Does Postgres 10 offer the same features in terms of clustering as Citus does?

manigandham8y ago

No. Postgresql is a single-node database. It supports replication and failover to other nodes and foreign-data-wrapper extensions that let it query other datasources, but it does not have any support for natively working as a distributed database across several nodes.

Citus is an extension that takes several database nodes and makes them appear as a single logical database server (at the table level, by automatically sharding them based on a column).

giancarlostoro8y ago

All I found was this:

https://wiki.postgresql.org/wiki/Replication,_Clustering,_an...

And Citus is the first link in that list.

riku_iki8y ago

Citus provides ability to build sharded environment.

Data for different customers can be stored on separate nodes, and your DB is not limited by capacity of one node.

In regular PG, all data need to fit single node.

oliwarner8y ago· 2 in thread

This seems like a VERY shallow analysis of the benefits.

Forking an industrial-grade tool means the entire lifespan of the entire product becomes your responsibility to your client. Tracking the major upgrade changes might be a pain in the arse but they're nothing compared to tracking every security and data-loss fix that bubbles around the Postgres community.

It's not just developer time that's the cost here. They had to compile the whole Postgres+Citus database, for every platform they support, in a timely manner, test it and distribute packages. Think of all the CPU cycles and bandwidth they're saving by only having to compile as an extension against public headers.

Functioning as an extension means Postgres and its distributors (eg Ubuntu) are the people responsible for keeping Postgres alive and secure. Citus only have to support their thing.

Why aren't they talking about how much this move is saving them in day-to-day? There's no shame in being efficient.

bane8y ago

Being an extension also makes you entirely dependent on somebody else's platform, and makes it possible that your work will simply be subsumed by the platform if they think it's important enough. It's a very weak business position to be in and you have to have incredible future looking planning and brand buy-in to make sure you succeed like this.

oliwarner8y ago

You're already dependent. Upstream can turn around tomorrow and provide everything your fork/extension does for free. They can alter their entire codebase to cause you weeks of work to keep up. It can be a hard slog being downstream, no doubt about it. That's why downstreams tend to "get involved" upstream. Sponsorship, sit on technical advisory boards, etc.

But what you're saying —which wasn't immediately obvious, and correct me if I'm wrong— is your users are using your database product, not Postgres, so you can hold them back as long as you like when they're using a forked product. They won't be carried away by an automatic update and it's much harder for them to jump ship.

And while there is some truth to that, it comes with a karmic cost. People picked you because you were based on their favourite, industry tested database. If you slip behind in features, or (more importantly) can't backport security fixes instantly, you're dead.

StreamBright8y ago· 1 in thread

Citus is my favorite website when it comes to Postgres content. Their blog posts are usually very informative and useful.

areskib8y ago

I couldn't agree more, I'm amazed by every single one of their post. They are doing a great job at communicating and giving rich insights on Postgres.

j / k navigate · click thread line to collapse

33 comments

31 comments · 5 top-level

siscia8y ago· 16 in thread

Since we are talking about postgres, please let me go a little OT

Is there any interests in a RDS as a Service? So basically setting up and running a completely fault aware postgres cluster in any infrastructure, either public or private?

techdragon8y ago

siscia8y ago

Pretty much the same pain point that I experience.

Thanks for your feedback :)

jacques_chester8y ago

Pivotal & IBM maintain a BOSH release of PostgreSQL for this purpose: https://github.com/cloudfoundry/postgres-release

Since it uses BOSH, you can deploy to a wide range of targets. OpenStack, vSphere, AWS, Azure, GCP and I forget what else.

Disclosure: I work for Pivotal.

mosselman8y ago

A lot of interest. Setting this up is too much sysops for someone who wants to spend his time developing code.

siscia8y ago

Thanks for your feedback :)

manigandham8y ago

siscia8y ago

Exactly!

Ideally, I would just need an SSH key inside your machines and the capabilities to open an ssh tunnel inside the firewall to scrape metrics.

Ideally, the metric should get exposed back to the customer.

I am not a big fan of containers when working with data that are irreplaceable. But the use k8s may really help.

1 more reply

brightball8y ago

Yes, but the trick with it is pricing it so that I have a reason to use it instead of just using RDS.

If I could get something like that on Digital Ocean I’d be all over it.

manigandham8y ago

You already have options:

https://aiven.io

https://www.databaselabs.io/

siscia8y ago

Well, Aiven seems to support DO: https://aiven.io/postgresql

no1youknowz8y ago

Out of interest, could you support MySQL? Or more specifically MemSQL?

manigandham8y ago

MySQL and MemSQL are very different, are you looking for a data warehouse specifically?

siscia8y ago

This is really just an idea to solve a pain point of mine that I guess is shared among us...

By the way I don't have any experience with MemSQL

jarym8y ago

I imagine there is interest in it since Aiven provide such a thing.

siscia8y ago

Aiven works only with public clouds...

te_chris8y ago

Yep!

simooooo8y ago· 4 in thread

They were surprised that maintaining a postgres fork was a pain in the ass?

stubish8y ago

The stuff Citus has been landing in PostgreSQL is fantastic.

jacques_chester8y ago

> I think it was more that forks just get ignored

They call out a number of forks that do quite nicely. I work for Pivotal, which sponsors Greenplum. Companies pay handsomely for the capabilities it brings to the table.

vog8y ago

I also don't get it.

One exception might be one-off changes, but we all know that nothing is more definitive than the temporary.

The only real exception is when your patch is superseded by another patch (or a better solution). Then, you maintian your private patch only until next version is finalized.

stubish8y ago

mosselman8y ago· 3 in thread

Can anyone shed some light on how the supposed new cluster features of Postgres 10 compare to something like Citus?

In other words. Does Postgres 10 offer the same features in terms of clustering as Citus does?

manigandham8y ago

Citus is an extension that takes several database nodes and makes them appear as a single logical database server (at the table level, by automatically sharding them based on a column).

giancarlostoro8y ago

All I found was this:

https://wiki.postgresql.org/wiki/Replication,_Clustering,_an...

And Citus is the first link in that list.

riku_iki8y ago

Citus provides ability to build sharded environment.

Data for different customers can be stored on separate nodes, and your DB is not limited by capacity of one node.

In regular PG, all data need to fit single node.

oliwarner8y ago· 2 in thread

This seems like a VERY shallow analysis of the benefits.

Functioning as an extension means Postgres and its distributors (eg Ubuntu) are the people responsible for keeping Postgres alive and secure. Citus only have to support their thing.

Why aren't they talking about how much this move is saving them in day-to-day? There's no shame in being efficient.

bane8y ago

oliwarner8y ago

StreamBright8y ago· 1 in thread

Citus is my favorite website when it comes to Postgres content. Their blog posts are usually very informative and useful.

areskib8y ago

I couldn't agree more, I'm amazed by every single one of their post. They are doing a great job at communicating and giving rich insights on Postgres.

j / k navigate · click thread line to collapse