Postgresql 9.6: Support parallel aggregation (opens in new tab)

(postgresql.org)

267 pointseMerzh10y ago62 comments

62 comments

48 comments · 10 top-level

HoyaSaxa10y ago· 8 in thread

A slight tangent, but I am still shocked that many open source projects lack downloads via https (including PostgreSQL). Sure you can offer some layer of security by signing the distributions, but ultimately users are lazy.

snuxoll10y ago

The -release RPM (which contains the yum repository configuration + GPG signing key) for any RedHat family distros is available over HTTPS. Check any of the links on this page http://yum.postgresql.org/repopackages.php#pg95 - all available over HTTPS.

The GPG signing key used by the apt repo for Debian and derivatives is also served over HTTPS @ https://www.postgresql.org/media/keys/ACCC4CF8.asc, and the instructions for use direct you to install it as such.

It literally matters not at this point whether downloads are delivered over HTTPS or not outside of anonymity (which is almost moot, because you are obviously downloading PostgreSQL or the few related packages in these repositories) since package signatures are verified.

ak4g10y ago

As a project, I don't think PostgreSQL wants (or should want) anybody running a DB from the source tarball that's (eventually) stamped as 9.6 - it's existence is a mere artifact of the actual packaging work that goes on during a release. In all cases you should be installing packages via your OS's installation mechanism, and those packages will be authenticated (or not, as the case may be, but it's not in-scope for Postgres-the-project, and it shouldn't be different for installing PG vs. any other piece of software on the system). A production-ready setup needs an OS user, logging/logfile rotation, some tooling for managing DB clusters, and a half-dozen other things I've never even thought about. If you just want to check out the source, clone the repo.

But in a production environment, you always want to use a package provided by your OS vendor. Postgres has excellent packagers for both Debian and RHEL-flavored distros (and I imagine more) and you really want to have the system-level considerations thought through by someone by someone who knows what needs to be thought about.

pdkl9510y ago

> you should be installing packages via your OS's installation mechanism.

My OS's install mechanism downloads the source tarball. It does authenticate the download, of course. The point being you don't know what platform someone is using, and the source may be the only way they can install PostgreSQL.

> PostgreSQL wants

It doesn't matter what they want - if the source is available for download, it will be used. Unauthenticated downloads are an "attractive nuisance" that puts users at risk. The actual download links[1] at www.postgresql.org do use https, but the HTML that contains the download URLs irresponsibly redirects https requests back to http. While the download of the actual source tarball is authenticated, the URL to that download can easily be modified in transit.

[1] https://www.postgresql.org/ftp/source/v9.5.1/

takeda10y ago

It's OSS so there's not much to hide.

I'm sure that someone who has capability to not only tap into but also modify traffic on the fly, won't have problem obtaining valid certificate from one of the hundreds CAs that everyone has in their web browsers.

If anything I feel https actually hurts, because it gives you false sense of security and you're more likely not to validate files with GPG, which you should do whether it is https or not.

harel10y ago

Out of curiosity what does it matter if your postgres download is via http or https? Is there any benefit of hiding your download from prying eyes?

HoyaSaxa10y ago

One main benefit of https is confidentiality, but for OSS downloads it more importantly provides an integrity and authentication benefit as well. Sure checksums and/or package signing should provide the same benefit, but most users cannot say they have checked these for every installation (and checksums are useless if served via http because they can be tampered via a man-in-the-middle attack).

brazzledazzle10y ago

To put it another way it's less about prying eyes and more about MitM attacks.

2 more replies

im_down_w_otp10y ago

You can validate the certificate used to establish the connection and thus the endpoint, if you want to, and thus avoid one potential vector of someone serving you a malicious postgres build. Because you're putting your trust in the network.

chris_wot10y ago· 7 in thread

There is going to come a point where Postgres passes Oracle in terms of features and performance.

There is also a point that Postgres will perform with enough features needed by most businesses that they'll choose it even though it doesn't match Oracle on a feature by feature basis.

We have hit peak Oracle. From this point forward it's going to be hard for Oracle to regain momentum. Expect a lot of FUD against Postgres - the more you see, the more worried you know Oracke execs are becoming.

willlll10y ago

Oracle is writing Russian IT shops to try and convince them from moving to postgres http://www.postgresql.org/message-id/CANNMO++6tPiwBv2OKcy-Hh...

reactor10y ago

First hand experience, I work for a bank which is a Fortune 100, and our department was heavily relying on Oracle Exadata, paying between 1-2 millions (don't know exact figure), we moved away to PostgreSQL and Casandra. Cost factor was not the only motive (1-2 millions was actually penny for this bank) but our Director was convinced that exadata was unnecessary.

chris_wot10y ago

Out of interest, what are the benefits of Exadata, or is it really a bit of emperorer's new clothes? Can you not get similar performance with similar good hardware?

2 more replies

kbenson10y ago

> There is going to come a point where Postgres passes Oracle in terms of features and performance.

I wouldn't be so sure. Oracle has deep pockets, and I'm not sure there's a reason why anything PostgreSQL wants to roll out they can't pay to prioritize and have done sooner.

> There is also a point that Postgres will perform with enough features needed by most businesses that they'll choose it even though it doesn't match Oracle on a feature by feature basis.

No argument there.

chris_wot10y ago

Oh, I'm not saying this is an overnight thing or that Oracle will be destroyed. In fact, I don't want to see them destroyed, but what I am interested in is that their market power and influence wanes over time. Ironically, this will be good for Oracle - they will be forced to dump immoral and illegal business practices.

The problem commercial businesses have when it comes to open source is that you might be able to reduce customer take up, by you can't compete with it like you would other businesses. In the closed source world you can purchase a company and shut down its product and thus kill off the competition.

As Microsoft have found, with open source software, that's not possible.

The other unfortunate thing for Oracle is that when they attack Postgres they have to publish lists of competitive advantages. All this does is give Postgres developers a todo list, and they then work towards implementing the features that matter.

1 more reply

pbarnes_110y ago

This migration has been happening for many years anyway.

I used to work consulting doing these types of migrations for customers for a huge IT services company.

chris_wot10y ago

Yeah, that point is probably occurring as we speak :-)

filereaper10y ago· 6 in thread

Short of doing a deep dive into Postgresql, do any universities use Postgresql as a basis for their DB implementation courses? I wanted to dive into Postgres and was hoping for some training wheels. Thanks.

__jal10y ago

PG is, IMHO, probably one of the best code bases for this.

I'm far and away not a database hacker, but have read parts of the code at various times to better understand what I was seeing. The codebase is incredibly well written and organized, and the documentation (both developer and user) is top-notch.

The only problem I can think of with using it in a course would be choosing what to focus on with only a semester. Add in the long revision history and I think there are also multiple theses in there on the sociology of open source.

spariev10y ago

There is a "Hacking PostgreSQL" course[1], which tries to explain PostgreSQL internals and architecture, but it is available in Russian only[2]

[1] http://postgrespro.ru/education/courses/hacking [2] https://www.youtube.com/watch?list=PLaFqU3KCWw6Jfb8IBNk3hZ07...

calinet610y ago

CS186 at Berkeley was entirely in the guts of PostgreSQL. But then, I suppose that's not too surprising given the history.

Personally, I found it a delight to work with, and that has translated into a great respect for the product itself. It is no great surprise that the pace of development stays strong and that engineers gravitate toward it; it's set up for exactly that.

aut_dan10y ago

https://courses.cs.washington.edu/courses/csep544/ might be helpful

morgante10y ago

My university does, though unfortunately we don't have any public course information.

ddorian4310y ago

Dive in the documentation + mailing list + postgresql planet.

pilif10y ago· 5 in thread

I am totally impressed by the work 2ndQuadrant is doing: many of the recent innovations to Postgres have been done by them and all of that without any obligation for them to be doing so. The BSD license would allow them to add all of these things to a proprietary fork that they could be selling.

Or they could just release their own fork under an open license and focus on just adding features.

But that's not how they work. All of their contributions are pushed upstream which is a very considerable effort with how conservative Postgres is at accepting new functionality.

Aside of that: there are 2ndQuadrant employees in the #postgres IRC chat room, helping people with daily support issues. This is their core business and yet they still help people for free (within reason). This is bloody impressive.

If I'm ever at a point when I need help with a Postgres issue, then they will be very first of the list of companies I would consider.

Thank you very much for all that you are doing.

pgaddict10y ago

While I do work for 2ndQuadrant and think that David did a very fine job on this great patch, it'd be a bit unfair not to mention this patch stands on parallel infrastructure built by EDB (and Robert Haas & Amit Kapila in particular) and others.

Kudos to them.

sandGorgon10y ago

I wish they would build a RDS alternative. I would use them in a heartbeat. There are all these companies/startups who are not in the target market for consulting... But who would gladly pay for hosted postgres.

pilif10y ago

Just a little heads-up: Postgres is very easy to run on your own and even in the default configuration runs well even for a considerable amount of users.

Over here, we only started to seriously thinking about what we're doing once we were handling in the order of 10K transactions per second.

Once you are at that level, you're probably going to need optimizations specific to your application and a generic database hoster might not be able to help you anyways.

I get that as a startup you don't have people for everything, but can you really afford to outsource the knowledge about the central piece of your application where all the value is stored at?

1 more reply

htn10y ago

Please check out our Aiven (https://aiven.io). Managed & hosted pay-per-hour PostgreSQL that is available on GCE, AWS and DigitalOcean.

1 more reply

jpgvm10y ago

If you don't mind hosting it yourself check out Flynn our pg appliance is very similar to how RDS is constructed. Or if you are more adventurous look at Joyent Manatee which the Flynn code is based off.

1 more reply

harel10y ago· 5 in thread

It looks like in 9.4 PG went after the document databases and now they are after Oracle. I don't know why I get excited with point releases of postgres, but I do.

stuartaxelowen10y ago

Because they are building an open source industry leading database. What's not exciting about that!

harel10y ago

Its not just that. I waited patiently for 9.4 to come out as I needed just that functionality of indexable JSONB for something I'm working on. It was pointless for me to start the project without it. 9.4 came out better than I could have expected. Then followed by Upserts, and now this... I'm directly and personally affected by each point release. Its fantastic.

pbreit10y ago

Is anyone still starting with MySQL anymore?

1 more reply

pgaddict10y ago

Because in the PostgreSQL world those are actually major versions, not "point updates" ;-)

harel10y ago

Apt user name

kbenson10y ago· 3 in thread

I would love to see some sample benchmarks for the type of gains you might see from this. I always see these interesting new PostgreSQL features being posted, and they sound cool, but it's hard to know how much they help in practice. I understand it's often highly workload and data dependent, but something would be better than nothing.

pgaddict10y ago

David Dowley (one of the authors of the patch) posted some measurements on TPC-H Q1: https://news.ycombinator.com/item?id=11332713

The machine has 4x E5-4620, so 32 physical cores. And with 30 workers it gets ~80% of the theoretical speedup. Not bad, I guess.

olavgg10y ago

It would help a lot for typical data warehouse queries that involves aggregating millions to billions of rows. SELECT EXPLAIN ANALYZE will tell if its relevant for you ;-)

kbenson10y ago

> SELECT EXPLAIN ANALYZE will tell if its relevant for you

That assumes I run PostgreSQL already, which I don't. I am interested in possibly switching at some point if it's worthwhile, but it's hard to muster the effort to do concerted testing of a representative sample of my data, including possibly changing how queries are done to take advantage of specific features, when I have little information to go on.

Not that I expect PostreSQL to do in-depth analysis of everything, but it would be great from both a promotional and technical standpoint if there was something like "we've seen something like X% speedup of queries utilizing Y, and up to Z% speedup in extreme cases." I mean, I assume they at least have rudimentary numbers for this, otherwise they would be making blind changes without knowing whether it improved or degraded performance. Providing just enough to get people interested in doing their own benchmarking (and possibly publishing them) would be great for everyone.

Edit: One of top comment is actually what I'm talking about (but apparently for a different feature). So it does get done, which is really nice. :)

Edit2: Now there's the link to the blog post for this feature. :)

1 more reply

tiglionabbit10y ago· 2 in thread

Does it have generic upserts yet?

ddorian4310y ago

What do you mean by generic ? Upserts were implemented in 9.5

tiglionabbit10y ago

As in you don't need to write a unique PL/pgSQL routine for each kind of upsert you want to do.

Oh, it does have upserts now. Awesome =]

pgaddict10y ago· 1 in thread

FWIW, a simple benchmark by David Rowley (one of the authors of the patch) are here:

http://blog.2ndquadrant.com/parallel-aggregate/

andruby10y ago

The article claims a near linear speedup for a very large query. On a 64 core machine, the query takes:

1375s with 0 workers

131s with 10 workers

56s with 30 workers

kazagistar10y ago· 1 in thread

Blog post with some details and benchmarks by one of the authors of the patch:

http://rhaas.blogspot.com/2015/11/parallel-sequential-scan-i...

pgaddict10y ago

That's not about parallel aggregate but about parallel scan.

Also, Robert is not the author of the patch, he did a review and committed it. The actual author are listed in the commit message: David Rowley and Haribabu Kommi.

allan_s10y ago

it's pretty interesting especially for those of us who are creating analytics tools on top of postgresql, the increase performance would certainly permit less denormalized work-around , like creating "manual" aggregate directly in the database (with all the problem of keeping them in sync with the rest)

and I'm still here waiting for 9.5 to arrive in RDS of amazon :(

j / k navigate · click thread line to collapse

62 comments

48 comments · 10 top-level

HoyaSaxa10y ago· 8 in thread

snuxoll10y ago

ak4g10y ago

pdkl9510y ago

> you should be installing packages via your OS's installation mechanism.

> PostgreSQL wants

[1] https://www.postgresql.org/ftp/source/v9.5.1/

takeda10y ago

It's OSS so there's not much to hide.

If anything I feel https actually hurts, because it gives you false sense of security and you're more likely not to validate files with GPG, which you should do whether it is https or not.

harel10y ago

Out of curiosity what does it matter if your postgres download is via http or https? Is there any benefit of hiding your download from prying eyes?

HoyaSaxa10y ago

brazzledazzle10y ago

To put it another way it's less about prying eyes and more about MitM attacks.

2 more replies

im_down_w_otp10y ago

chris_wot10y ago· 7 in thread

There is going to come a point where Postgres passes Oracle in terms of features and performance.

There is also a point that Postgres will perform with enough features needed by most businesses that they'll choose it even though it doesn't match Oracle on a feature by feature basis.

willlll10y ago

Oracle is writing Russian IT shops to try and convince them from moving to postgres http://www.postgresql.org/message-id/CANNMO++6tPiwBv2OKcy-Hh...

reactor10y ago

chris_wot10y ago

Out of interest, what are the benefits of Exadata, or is it really a bit of emperorer's new clothes? Can you not get similar performance with similar good hardware?

2 more replies

kbenson10y ago

> There is going to come a point where Postgres passes Oracle in terms of features and performance.

I wouldn't be so sure. Oracle has deep pockets, and I'm not sure there's a reason why anything PostgreSQL wants to roll out they can't pay to prioritize and have done sooner.

> There is also a point that Postgres will perform with enough features needed by most businesses that they'll choose it even though it doesn't match Oracle on a feature by feature basis.

No argument there.

chris_wot10y ago

As Microsoft have found, with open source software, that's not possible.

1 more reply

pbarnes_110y ago

This migration has been happening for many years anyway.

I used to work consulting doing these types of migrations for customers for a huge IT services company.

chris_wot10y ago

Yeah, that point is probably occurring as we speak :-)

filereaper10y ago· 6 in thread

__jal10y ago

PG is, IMHO, probably one of the best code bases for this.

spariev10y ago

There is a "Hacking PostgreSQL" course[1], which tries to explain PostgreSQL internals and architecture, but it is available in Russian only[2]

[1] http://postgrespro.ru/education/courses/hacking [2] https://www.youtube.com/watch?list=PLaFqU3KCWw6Jfb8IBNk3hZ07...

calinet610y ago

CS186 at Berkeley was entirely in the guts of PostgreSQL. But then, I suppose that's not too surprising given the history.

aut_dan10y ago

https://courses.cs.washington.edu/courses/csep544/ might be helpful

morgante10y ago

My university does, though unfortunately we don't have any public course information.

ddorian4310y ago

Dive in the documentation + mailing list + postgresql planet.

pilif10y ago· 5 in thread

Or they could just release their own fork under an open license and focus on just adding features.

But that's not how they work. All of their contributions are pushed upstream which is a very considerable effort with how conservative Postgres is at accepting new functionality.

If I'm ever at a point when I need help with a Postgres issue, then they will be very first of the list of companies I would consider.

Thank you very much for all that you are doing.

pgaddict10y ago

Kudos to them.

sandGorgon10y ago

pilif10y ago

Just a little heads-up: Postgres is very easy to run on your own and even in the default configuration runs well even for a considerable amount of users.

Over here, we only started to seriously thinking about what we're doing once we were handling in the order of 10K transactions per second.

Once you are at that level, you're probably going to need optimizations specific to your application and a generic database hoster might not be able to help you anyways.

I get that as a startup you don't have people for everything, but can you really afford to outsource the knowledge about the central piece of your application where all the value is stored at?

1 more reply

htn10y ago

Please check out our Aiven (https://aiven.io). Managed & hosted pay-per-hour PostgreSQL that is available on GCE, AWS and DigitalOcean.

1 more reply

jpgvm10y ago

1 more reply

harel10y ago· 5 in thread

It looks like in 9.4 PG went after the document databases and now they are after Oracle. I don't know why I get excited with point releases of postgres, but I do.

stuartaxelowen10y ago

Because they are building an open source industry leading database. What's not exciting about that!

harel10y ago

pbreit10y ago

Is anyone still starting with MySQL anymore?

1 more reply

pgaddict10y ago

Because in the PostgreSQL world those are actually major versions, not "point updates" ;-)

harel10y ago

Apt user name

kbenson10y ago· 3 in thread

pgaddict10y ago

David Dowley (one of the authors of the patch) posted some measurements on TPC-H Q1: https://news.ycombinator.com/item?id=11332713

The machine has 4x E5-4620, so 32 physical cores. And with 30 workers it gets ~80% of the theoretical speedup. Not bad, I guess.

olavgg10y ago

It would help a lot for typical data warehouse queries that involves aggregating millions to billions of rows. SELECT EXPLAIN ANALYZE will tell if its relevant for you ;-)

kbenson10y ago

> SELECT EXPLAIN ANALYZE will tell if its relevant for you

Edit: One of top comment is actually what I'm talking about (but apparently for a different feature). So it does get done, which is really nice. :)

Edit2: Now there's the link to the blog post for this feature. :)

1 more reply

tiglionabbit10y ago· 2 in thread

Does it have generic upserts yet?

ddorian4310y ago

What do you mean by generic ? Upserts were implemented in 9.5

tiglionabbit10y ago

As in you don't need to write a unique PL/pgSQL routine for each kind of upsert you want to do.

Oh, it does have upserts now. Awesome =]

pgaddict10y ago· 1 in thread

FWIW, a simple benchmark by David Rowley (one of the authors of the patch) are here:

http://blog.2ndquadrant.com/parallel-aggregate/

andruby10y ago

The article claims a near linear speedup for a very large query. On a 64 core machine, the query takes:

1375s with 0 workers

131s with 10 workers

56s with 30 workers

kazagistar10y ago· 1 in thread

Blog post with some details and benchmarks by one of the authors of the patch:

http://rhaas.blogspot.com/2015/11/parallel-sequential-scan-i...

pgaddict10y ago

That's not about parallel aggregate but about parallel scan.

Also, Robert is not the author of the patch, he did a review and committed it. The actual author are listed in the commit message: David Rowley and Haribabu Kommi.

allan_s10y ago

and I'm still here waiting for 9.5 to arrive in RDS of amazon :(

j / k navigate · click thread line to collapse