An early look at Postgres 14: Performance and monitoring Improvements (opens in new tab)

(pganalyze.com)

456 pointsbananaoomarang5y ago245 comments

245 comments

126 comments · 17 top-level

matthewbauer5y ago· 40 in thread

Postgres is one of those pieces of software that’s so much better than anything else, it’s really incredible. I wonder if it’s even possible for competitors to catch up at this point - there’s not a lot of room for improvement in architecture of relational databases any more. I’m starting to think that Postgres is going to be with us for decades maybe even centuries.

Do any other entrenched software projects come to mind? The only thing comparable I can think of are Git and Linux.

jasonwatkinspdx5y ago

There's a ton of room for improvement in the architecture of relational databases. This isn't a dig against Postgres, or ignoring how difficult it will be to get a new system to the same level of maturity. But databases designed natively for cloud/clustering, SSDs, (pmem soon perhaps), etc are quite a bit different. There's enormous simplifications and performance gains possible.

There's been a lot of exciting work in this area over the last decade or so. Andy Pavlo's classes are great surveys of the latest work: https://15721.courses.cs.cmu.edu/spring2020/

CosmosDB is an example of a relational (multi paradigm properly) database with a quite different architecture vs the classic design, that's moved into production status quite rapidly.

FaunaDB and CockroachDB are moving with solid momentum too.

oblio5y ago

Yeah, to list a bit:

- scaling is non-trivial (you can't just add a node and have PostgreSQL automagically Do The Right Thing™)

- you can only have so many connections open to the database, causing issues with things such as AWS Lambda

- I don't remember if this was changed, but I got the impression a while ago that having dynamic DB users was a bit cumbersome to set up (plug PostgreSQL to AD/LDAP)

2 more replies

exceptione5y ago

You must be kidding me with the CosmosDB mention. It doesn't even have foreign key constraints. I have to work with it and I have never seen such a feature-poor dbms before.

2 more replies

threeseed5y ago

There are also technologies like NVMe over Fabric/RDMA, eBPF, XDP, io_uring etc which are just starting to get traction and are game changers for performance. None of which are being used.

All of these require a different architecture so expect to see newer databases push things even further.

1 more reply

gogopuppygogo5y ago

Cockroach is the worst brand for a database ever.

Even Croach would be a massive branding improvement.

This is similar to how gimp is a terrible brand.

3 more replies

teej5y ago

Lecture 1 of that series is surprising and hilarious for a class about databases.

1 more reply

stickfigure5y ago

I'm an enormous fan of Postgres, it's my default go-to RDBMS. But the memory expense of connections is a huge issue and this article doesn't convince me that it's solved.

The machine being used for this benchmark has 96 vCPUs, 192G of RAM, and costs $3k/mo.

My business runs just fine on a 3.75G, 1 vCPU instance. But idle connections eat up a huge amount of RAM and I sometimes find myself hitting the limits when a load spike spins up extra frontend instances.

Sure I could probably setup pgbouncer and some other tools but that's a lot of headache. I'm acutely aware that MySQL (which I dislike because no transactional DDL) does not suffer from this issue. I also don't see this being solved without a major rewrite, which seems unlikely.

So Postgres has at least one very serious fault that makes room in the marketplace. The poor replication story is another.

pgaddict5y ago

It isn't solved, and no one claimed it to be solved. The scalability improvement is related to how we build MVCC snapshots (i.e. information which transactions are visible to a session). That may reduce the memory usage a bit, but it's more about CPU I think.

As for the per-connection memory usage, the big question is whether there really is a problem (and perhaps if there's a reasonable workaround). It's not quite clear to me why you think the issues in your case are are due to idle connections, but OK.

There are two things to consider:

1) The fixed per-connection memory (tracking state, locks, ..., a couple kBs or so). You'll pay this even for unused connections.

2) Per-process memory (each connection is handled by a separate thread).

It's difficult to significantly reduce (1) because that state would no matter what the architecture is, mostly. Dealing with (2) would probably require abandoning the current architecture (process per connection) and switching to threads. IMO that's unlikely to happen, because:

(a) the process isolation actually a nice thing from the developer perspective (less locking, fewer data races, ...)

(b) processes work quite fine for reasonable number of long-lived connections, and for connection pools address a lot of the other cases

(c) PostgreSQL supports a lot of platforms, some of which may not may not have very good multi-threading support (and supporting both architectures would be quite a burden)

But that's just my assessment, of course.

2 more replies

megous5y ago

Setting up pgbouncer is not much headache and for for OLTP workloads, it works great. You can even see it in the graph, that best performance is when number of CPU cores = number of connections. And so will be memory use. :)

1 more reply

btbuilder5y ago

I agree - the disparity between the cost of idle connections in Postgres vs MSSQL is hampering our ability to migrate.

2 more replies

foota5y ago

Out of curiosity, do you know what causes this?

1 more reply

ccleve5y ago

Say more about the "poor replication story". I thought replication was pretty good. What's wrong with it?

1 more reply

vosper5y ago

> Do any other entrenched software projects come to mind?

Elasticsearch is underrated here, IMO. Yes, there are alternatives for simple fulltext search. But there’s a lot more it can do (adhoc aggregations incorporating complex fulltext searches, with custom scripted components; geospatial; index lifecycle management) and if you’re using those features, there’s nothing else comparable.

It’s pretty stable, too, once you’ve got the cluster configured. We don’t have outages due to problems with Elasticsearch.

FridgeSeal5y ago

To provide an opposing viewpoint here: ES and it’s monstrous API and resourcing requirements are a pain to manage and run. It’s a product that has pivoted in so many directions that it’s just become a bit of a mess. I don’t want a full-text search engine that also has graphs, ML, some bizarre scripting feature, log management, etc all stapled in on top. Geospatial and other analytic stuff I’d rather use a dedicated OLAP db like Redshift or ClickHouse.

I’m currently evaluating typesense vs ES for a fts project and typesense is winning so far by simply be “not painful” to deal with.

1 more reply

jeff-davis5y ago

I don't know about elasticsearch specifically, but I'm skeptical of special-purpose systems for databases.

They are great in some cases and terrible in others, and over time, use cases push database systems into their worst cases. Use cases rarely stay in the sweet spot of a special-purpose system.

That being said, if the integration is great, and/or the special system is a secondary one (fed from a general-purpose system), then it's often fine.

1 more reply

bradleyjg5y ago

It’s frustrating to need a run-time team for a piece of infrastructure, especially one sold as IaaS.

It’s totally understandable that you’d need developers to have expertise in patterns and anti-patterns, as well as needing an expert to set things up in the first place, but you shouldn’t have to have a dedicated ES monitoring / tuning / babysitting team like Oracle DBAs of yore. That you do, means it isn’t there yet as a product.

1 more reply

ranit5y ago

> Do any other entrenched software projects come to mind?

SQLite.

IshKebab5y ago

I'm pretty hopeful that DuckDB will replace some of the use of SQLite. SQLite is great but it sucks that it's entirely dynamically typed (the types specified for columns are completely ignored).

1 more reply

jeff-davis5y ago

I like to say that "Postgres is a great default". It's generally very good, and also very adaptable to special purposes, so it covers a wide range of use cases.

But saying "so much better" is too strong.

1 more reply

dimgl5y ago

Postgres is good, even great, but this is hyperbole. Postgres has its downsides, autovacuum being one of them.

petergeoghegan5y ago

Although the article doesn't mention it, index bloat will be far better controlled in Postgres 14:

https://www.postgresql.org/docs/devel/btree-implementation.h...

One benchmark involving a mix of queue-like inserts, updates, and deletes showed that it was practically 100% effective at controlling index bloat:

https://www.postgresql.org/message-id/CAGnEbogATZS1mWMVX8FzZ...

The Postgres 13 baseline for the benchmark/test case (actually HEAD before the patch was committed, but close enough to 13) showed that certain indexes grew by 20% - 60% over several hours. That went down to 0.5% growth over the same period. The index growth much more predictable in that it matches what you'd expect for this workload if you thought about it from first principles. In other words, you'd expect about the same low amount of index growth if you were using a traditional two-phase locking database that doesn't use MVCC at all.

Full disclosure: I am the author of this feature.

3 more replies

sweeneyrod5y ago

I think many mercurial users would disagree with you about git.

jayd165y ago

Are we talking about market dominance, mind share or the idea that there's no real competition?

MySQL and Oracle exist. Mercurial and perforce exist. I'm not sure it's a terrible stretch to compare git and postures.

1 more reply

jjeaff5y ago

MySQL 8 is not that far behind in feature parity. And is ahead when it comes to scalability. So I don't see postgres as necessarily standing alone.

purerandomness5y ago

No DDL transactions, no materialized views, the list is endless.

There's almost no reason to pick MySQL for a new project.

2 more replies

paozac5y ago

MySQL's lack of DDL transactions is a serious shortcoming.

ezekiel685y ago

You claim that MySQL 8 is ahead when it comes to scalability. What are the bases of this claim? When I see comparisons or entire systems that rely on a database (that is, not micro-benchmarks) such as the TechEmpower web framework benchmarks [0] , I notice that the 'Pg' results cluster near the top, with the "My" results showing up further down the rankings. I understand this isn't version 14 of the former versus version 8 of the latter. But it makes me wonder what the basis of your claims is.

[0] https://www.techempower.com/benchmarks/

3 more replies

ksec5y ago

Are there any Roadmap for MySQL 9 ?

yakubin5y ago

Fortran for linear algebra software.

Excel for business spreadsheets.

Java for enterprise server software.

nsajko5y ago

> Fortran for linear algebra software.

Not an expert, but it is my understanding that Julia is becoming an ever more serious competitor day by day.

> Excel for business spreadsheets.

Honest question, what does LibreOffice miss compared to Excel? In any case, (again not an expert) spreadsheets seem quite inferior to a combination of Julia, CSV and Vega (Lite); although there are certainly more people that are familiar with operating Excel.

2 more replies

Supermancho5y ago

> Java for enterprise server software.

Big corporations are horribly inefficient and Enterprise Software necessarily so from that...if you're saying Java is terrible by nature of it being the goto for enterprise, then that makes sense. It took 20 years for it to swap places with COBOL and I expect it will be something else in 20 more.

1 more reply

eterm5y ago

I think anyone who has worked a lot with MSSQL would disagree with Postgres being "so much better". It's only really in the last few years that postgres has pulled ahead, MSSQL was lot more feature rich and performant for a decade.

harikb5y ago

MSSQL ? As in Microsoft SQL Server? I have heard this argument a lot and all the comparisons I have seen are specific benchmarks on specialized hardware. My own personal experience wasn’t anything like the benchmarks

2 more replies

dilyevsky5y ago

Kubernetes when it comes to clustering.

fibers5y ago

i had to roll back to 9.6 on windows because \COPY is fundamentally broken for large cvs

CapriciousCptl5y ago

What's the issue? Just on Windows? Mac OS X with 13.2 has no issue for me with the 1.1gigabyte 20million record csv just imported last week, or some bigger ones I did a few months back.

1 more reply

jpeter5y ago

What exactly makes Postgres better than MySql? There seem to be certain design decisions like WAL or process per connection that cause problems at scale

https://eng.uber.com/postgres-to-mysql-migration/

Tostino5y ago

That article really isn't a good critique of Postgres.

autodeadmehaha5y ago

Anyone whose ever had to upgrade postgres ever knows postgres can't fail fast enough. They must fix their upgrade paths and it's endless means to completely fuck you if they want to be taken seriously.

yjftsjthsd-h5y ago

? What's wrong with pg_upgrade?

e1g5y ago· 14 in thread

Another exciting feature in PG14 is the new JSONB syntax[0], which makes it easy to update deep JSON values -

  UPDATE table SET some_jsonb_column['person']['bio']['age'] = '99';

[0] https://erthalion.info/2021/03/03/subscripting/

GordonS5y ago

Gods, but this is fabulous!

JSONB capabilities in Postgres are amazing, but the syntax is really annoying - for example, I'm forever mixing up `->` and `->>`. This new syntax feels far more intuitive.

topicseed5y ago

Constantly have to google up the JSONB wuery syntax, it's just too confusing to me, although it is indeed powerful.

Glad to this this super intuitive and familiar syntax added. Will make writing these updates a lot lot lot easier. Not even close!

xfalcox5y ago

Wow is this for real? That is such a big quality of life change! Happy to see it!

megous5y ago

Not much different from some_jsonb#>>'{some,path}' and once you add the need to convert out of jsonb to text, you'll not be saving any characters either. At least for queries.

For updates, it looks nice I guess.

4 more replies

bionhoward5y ago

love it. time to bake ramda.sql

roenxi5y ago

Postgres is bowing to the inevitable, JSON support is too much in demand.

But this is going to be a classic example of bad design. Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard. It is pretty easy to see how JSON will play out: some bright young coder will use JSON because it is easier, then over the course of 12 months discover the benefits of a constrained schema, and then have a table-in-a-table JSON column.

It isn't so out there to think that ongoing calls for JSON support will lead Postgres to re-implement tables in JSON. We've already got people trying to build indexes on fields inside a JSON field.

This is needless complexity engineered by people who insist on relearning schemas from scratch, badly, rather than trusting the database people who say "you need to be explicit about the schema, do data modelling up front".

paulddraper5y ago

> Postgres is bowing to the inevitable

I think PostgreSQL has always been very pragmatic. It's supported JSON natively since 9.2 (Sep 2012).

> Databases are a bad place to be storing JSON

You're right that "mature" features and projects have a very good understand of the schema. But not everything is that.

Suppose I want to collect info from the Github API about a bunch of repos. I can just store the entire JSON response in a table and then query it at my leisure.

There's also something to be said for contiguous access. Joining tons of little records together has performance problems. Composite types and arrays can also fill this void, but they both have their own usability quirks.

ako5y ago

I use the json features of postgres to turn json into relation data. Store all json messages received in a table, then use a materialized view to extract the relevant parts into columns. Works well, and lets me keep the original data around.

1 more reply

golergka5y ago

> Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard.

That's why in 99% of cases, Postgresql uses jsonb as storage standard, which is binary and compressed.

> This is needless complexity engineered by people who insist on relearning schemas from scratch

No, this is the right tool for situations where schemas are polymorphic, fluid, or even completely absent (like raw third-party data). I love SQL and following normal forms, and it is the right tool for most situations, but not all.

tgv5y ago

I've learned and applied my schema normalization and what have you got. But it's not the be-all and end-all of good engineering. What I greatly appreciate about hierarchical value storage in contrast to related flat records, is that it is so much easier to store and retrieve a tree. No need to generate ids and insert rows one by one, no need to decode the result of large joins. Because it doesn't only take time to write code for that, it can contain errors too.

If you've got hierarchical data and you just want to store, update and retrieve it as a whole (which is my use case), JSON is a good choice. Granted, it could be stored as a string/blob in my case. I don't really need to search within.

toomanybeersies5y ago

JSON in Postgres is a bit like a nail gun. Used correctly, it's incredibly useful. But in inexperienced hands (and lacking good technical leadership), it's easy to shoot yourself in the thigh.

You don't even need JSONB to commit war crimes on a Postgres database. There's many things that Postgres can do, but probably shouldn't be done:

- Storing "foreign keys" in an array column, instead of using a join table

- Storing binary files as base64 encoded strings in text columns

- Using a table with `key` and `value` string columns instead of using redis

- Pub/sub using NOTIFY/LISTEN - Message queueing

- Other forms of IPC in general

- Storing executable code

- God tables

Even when trying to use Postgres appropriately, plenty of engineers don't get it right: unnecessary indices, missing indices, denormalised data, etc.

This isn't unique to Postgres, or relational databases in general. Any form of storage can and will be used to do things it's not designed or appropriate for. You can use as easily use S3 or Elasticsearch for message queuing, and can even find official guides to help you do so. Go back 20 years or so, and you can find implementations of message busses using SOAP over SMTP.

The problem isn't JSONB (or any other feature). It's bad engineering. Usually it's an incarnation of Maslow's Hammer: when all you have is a hammer, everything looks like a nail.

1 more reply

dragonwriter5y ago

> Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard.

JSON makes perfect sense in a database that already supports all of: BLOB, TEXT, XML, ARRAY, and composite datatypes, including any datatype, including those on this list, for members of ARRAY and composites.

OTOH, Postgres has had XML since v8.2 (2006) and JSON since 9.2 (2012), and “tables in <supported structured serialization format>” hasn’t happened yet, even as discussion item AFAIK, so perhaps it would be bad, but even so it seems to be just fantasizing something to worry about.

ilikepi5y ago

The native JSON data type was introduced with PG 9.2 in 2012.

1 more reply

Closi5y ago

> This is needless complexity engineered by people who insist on relearning schemas from scratch, badly, rather than trusting the database people who say "you need to be explicit about the schema, do data modelling up front".

The reason is with some projects/data it's hard to be explicit about the schema which is why NoSQL had it's popularity phase.

Now most applications don't have either entirely structured or entirely unstructured data, they will have a mix - so it's absolutely brilliant for one tool to do both. If they didn't support JSON I have a strong suspicion that they wouldn't have had some of the growth we have seen for Postgres across the last few years.

Waterluvian5y ago· 14 in thread

Tangential to this topic:

If I have a Django + PG query that takes 1 second and I want to deeply inspect the breakdown of that entire second, where might I begin reading to learn what tools to use and how?

emilsedgh5y ago

EXPLAIN (ANALYZE, BUFFERS) <YOUR QUERY>

Take the result of this and paste it into https://explain.depesz.com/

which will make it human readable.

Understanding this is sometimes very easy, but if you want to understand what they _really_ mean, you can read depesz.com

diminish5y ago

I use it frequently - but I wish there was a tool which went into the semantics somewhat.

2 more replies

jakebasile5y ago

Wow, how have I never heard of this tool?! Thanks a lot for the link!

1 more reply

purerandomness5y ago

I recommend the book "SQL Performance Explained" by Markus Winand: https://sql-performance-explained.com/

It covers all major databases and is a good start to dive into database interna and how to interpret output from query analyzers.

Other than that, I highly recommend joining the mailing list and IRC (#postgresql on libera.chat).

Lots of valuable tricks being shared there by people with decades of experience.

isatty5y ago

Did freenode get renamed?

1 more reply

fabian2k5y ago

EXPLAIN ANALYZE in Postgres will give you the query plan, learning to understand that output is very useful to figure out why a query is slow. If the query isn’t slow, you can look into Django, but the DB is often a good first guess in these cases.

etxm5y ago

I’d start w ‘EXPLAIN query’, if you arent familiar with the output there, you can put it on PEV and get a visualization.

https://tatiyants.com/pev/#/plans

snissn5y ago

agree!! this page is so helpful

nerdbaggy5y ago

Django has built in explain support which can guide you on the right track https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...

bredren5y ago

Just in case someone’s reading this and isn’t also aware: Django Debug Toolbar offers somewhat interactive exploration of queries.

It can also be used with Django Rest Framework via the browsable api.

May be parent is looking for deeper insight than this but it is useful to do quick visual query inspection.

blondin5y ago

this.

django debug toolbar (or similar) should be the first thing you go to because these tools understand the django ORM well.

the other thing that comes to mind is enabling query timing in your django shell. i believe you might need an extension for this.

then you can look at the postgres itself. but i would keep it at the django layer at first because it might reveal something about the ORM.

Izkata5y ago

If you only have the Django queryset and not the SQL, you can generate pseudo-sql using "print(queryset.query)".

Note that this isn't valid SQL, just an approximation, because Django doesn't generate a single SQL string, but uses the underlying library's parameterization. So you'll have to fiddle with quotes and such to get SQL you can run the EXPLAIN on that's mentioned in the other replies.

epberry5y ago

This will get you started but is by no means a full guide on query optimization, https://arctype.com/blog/postgresql-query-plan-anatomy/. There's also a fair number of django posts on this blog.

cosmosgenius5y ago

All django query have a .explain with it. Which is similar to runnning an explain in the DB, but less detailed.

wiradikusuma5y ago· 12 in thread

I'm thinking of using Postgres for a project, but a DBA friend told me operationally it's more challenging than MySQL. Unfortunately, he can't elaborate. Does anyone have real work experience? Or is it based on outdated "PG must manually vacuum frequently"?

keeperofdakeys5y ago

Postgres has some disadvantages that can pop up on certain workloads (eg. bloat) but so does MySQL. And most of those limitations are only when you've got long open transactions, trying to hammer it IO wise, or you're making really big databases (100GB-1TB or more). However for both Postgres and MySQL there is plenty of documentation about these problems, and how to resolve them. So you'll never be "stuck" with issues.

In general I find postgres "just works" a lot more than MySQL. MySQL has a really bad habit of sticking with bad defaults for a long period, while having better configuration available. On the other hand postgres devs actively remove/change defaults so you're always getting the best it has to offer.

If you pick one, and you don't like it there are plenty of tools to change between them. If you're curious you could even deploy both of them.

ComputerGuru5y ago

One thing is that Postgres doesn’t let you just upgrade to a new major version, as it doesn’t update the format of the on-disk binary database files; you must replicate from an existing node or dump/restore. MySQL upgrades the previous version when a new version is installed (which can cause problems, but is certainly “easier”).

andruby5y ago

Pg_upgrade [0] is an official part of postgres and does the binary inplace upgrade for you. You should obviously test before running in production, but it has worked perfectly for us when upgrading a 10+TB cluster from pg11 to pg13

[0] https://www.postgresql.org/docs/current/pgupgrade.html

2 more replies

ASalazarMX5y ago

I think this is very convenient; you don't want to automatically upgrade a big database because you probably want to choose the downtime window. This is obviously by design, but I'd also like being able to automate the pg_upgradecluster pg_dropcluster process, specially for install-and-forget little databases.

IMO, the biggest shock from the MSSQL/MYSQL to PostgreSQL migration was not having 1 or 2 specific files per database, specially if you used to backup the files instead of doing a formal database backup.

agustif5y ago

Are you going to operate it our just rent out some cloud service?

Postgres by itself doesn't have a great horitzontal scaling strategy as of now I think. You need Citus or somt like that on top, maybe your friend was referencing that?

CapriciousCptl5y ago

You can fiddle with the autovacuum daemon[1,2] but we've never really had to. These days we just run AWS RDS when it counts or a dedicated VPS when it doesn't and things go fine--

[1,2] https://www.postgresql.org/docs/13/routine-vacuuming.html https://www.postgresql.org/docs/current/planner-stats.html

The main issue we get is the 1 connection = 1 process issue although there are ways to mitigate that (namely pgbouncer).

masklinn5y ago

I don’t know if there are distros which tune it, but the default configuration is usually… conservative.

offtop55y ago

I wouldn't imagine postgres is really much harder than MySQL.

Both are a degree more difficult than NoSQL. The main issue is maintaining schema's

yannoninator5y ago

perhaps your DBA friend was operating PG themselves?

nowadays postgres in the cloud does all of this for you.

sigzero5y ago

MySQL is for that aren't really DBAs and don't want to be (this doesn't mean DBAs don't use it). It's a lot easier to manage.

unnouinceput5y ago

Your DBA friend is stuck in 2000's. Let dinosaurs die and you go with PGSQL because is superior to MySQL on everything.

And don't take my word for it, see for yourself here:

https://en.wikipedia.org/wiki/Comparison_of_relational_datab...

And MySQL is an Oracle product these days, go with MariaDB instead as this one is a MySQL fork made by the original papa of MySQL.

tfigment5y ago

Lacks first class temporal tables. Maybe not important to you and not on that list so do we dismiss that.

eikenberry5y ago· 7 in thread

Any progress on high availability deployments yet? Or does it still rely on problematic, 3rd party tools?

Last time I was responsible for setting up a HA Postgres cluster it was a garbage fire, but that was nearly 10 years ago now. I ask every so often to see if it has improved and each time, so far, the answer has been no.

porsager5y ago

You should definitely give pg_auto_failover https://github.com/citusdata/pg_auto_failover/ a try. I've written about my experience here: https://github.com/citusdata/pg_auto_failover/discussions/61... which hasn't changed since..

threeseed5y ago

If you want HA use AWS RDS, Azure Citus, GCP Cloud SQL.

Otherwise use MySQL, Oracle, MongoDB, Cassandra etc if you want to run it on your own.

Any other database that invested in a native and supported HA/clustering implementation.

tluyben25y ago

Cockroachdb or Yugabyte work well for some cases you might use postgres for.

edoceo5y ago

From the old days it's way better. Both Logical and streaming replication is only a few lines, few commands kind of thing.

Logical for streaming to read only replicas and streaming for fail-over. My client-app still needs to know try-A then try-B (via DNS or config)

latch5y ago

But there's so much more to it than this, e.g. upgrading, failing over, point in time recovery, monitoring.

I manage both a cockroachdb cluster and a few PG setups. Out postgres' have streaming replication to a standby with barman running on the standby. They are night and day.

Sure 2021 PG is way better than 2010 PG. But relative to available options, it's much worse.

rusbus5y ago

I found running a 6-node Patroni cluster on Kubernetes to be a surprisingly pain-free experience a couple of years ago

tpetry5y ago

I have been looking at patroni for years. But i still do not feel compatible using it in a production environment. If something fails it will be really really hard to fix it, but i have the same feeling for almost all these complex kubernetes operator doing a lot of magic work to have a simple solution.

rargulati5y ago· 6 in thread

What's going to be the vitess of Postgres? Seems to be the "last" missing piece? Or is that not a focus and fit for PG?

yed5y ago

That would be Citus: https://www.citusdata.com/

threeseed5y ago

Which is now owned by Microsoft so except to see enterprise support disappear.

Instead you are likely to be forced to use a cloud hosted PostgreSQL instance in order to get HA/clustering.

jpgvm5y ago

Vitess for PostgreSQL will probably just be... Vitess.

The concepts behind Vitess are sufficiently general to simply apply them to PostgreSQL now that PostgreSQL has logical replication. In some ways it can be even better due to things like replication slots being a good fit for these sorts of architectures.

The work to port Vitess to PostgreSQL is quite substantial however. Here is a ticket tracking the required tasks at a high level: https://github.com/vitessio/vitess/issues/7084

qaq5y ago

I think something like YugabyteDB

ksec5y ago

I think vitess has some long term goal to also support Postgre.

osser5y ago

There are no plans right now.If the Postgres community (or a Postgres user) would like to take this project up, the best way to proceed would be to do it as a fork of Vitess. Once the implementation has been proven in production, a “merge” project can be planned to bring the fork back into upstream.

arunitc5y ago· 4 in thread

Delete From "APCRoleTableColumn" Where "ColumnName" Not In (Select SC.column_name From (SELECT SC.column_name, SC.table_name FROM information_schema.columns SC where SC.table_schema = 'public') SC, "APCRoleTable" RT Where SC.table_name = RT."TableName" and RT."TableName" = "APCRoleTableColumn"."TableName");

I know this is not an optimized SQL. But this takes about 5 seconds in Postgre while the same command runs in milliseconds in MSSQL Server. The APCRoleTableColumn has only about 5000 records. The above query is to delete all columns not present in the schema from the APCRoleTableColumn table

I used to be a heavy MSSQL user. I do love Postgre and have switched over to using it in all my projects and am not looking back. I wish it was as performant as MSSQL. This is just one example. I can list a number of others too.

davidrowley5y ago

If I remember correctly, SQL Server will convert NOT IN to anti-join. PostgreSQL currently does not do that due to NOT IN being incompatible with anti-joins in regards to NULL values. There's room for improvement there by detecting if NULLs can exist or not, and converting if they can't.

If you don't need the NOT IN weirdness around NULL values then I'd suggest you just use a NOT EXISTS. That'll allow something more efficient like a Hash Anti Join to be used during the DELETE. Something like:

Delete From "APCRoleTableColumn" Where Not EXISTS (Select 1 From information_schema.columns SC INNER JOIN "APCRoleTable" RT ON SC.table_name = RT."TableName" Where RT."TableName" = "APCRoleTableColumn"."TableName" AND SC.column_name = "APCRoleTableColumn"."ColumnName" AND SC.table_schema = 'public');

Is that faster now?

aidos5y ago

It’s a little hard to parse that on mobile but it looks like you’re doing correlated subqueries against the dB schema for each row in the table you’re deleting from.

As others have said, explain analyze will show you what’s going on. I’m fairly sure this query would be fixed by flipping and / or adding an index. 5k records is nothing to pg.

tpetry5y ago

Can you share the explain analyze output of the query?

croh5y ago

Have you checked performance using different algorithms like hash-join, merge-join and nested-loop ?

efxhoy5y ago· 3 in thread

> Automatic cancellation of long-running queries if the client disconnects

Sweet! I often screw up a query and need to cancel it with

  pg_cancel_backend(pid)

because Ctrl-C rarely works. With this I can just ragequit and reconnect. Sweet!

znep5y ago

I agree this is a great addition, but FWIW it isn't normal for ^C to not work in psql. Perhaps you are using some other client that doesn't support aborting queries properly, or have something on the network between you and the server behaving poorly and dropping connections?

efxhoy5y ago

It's psql through an ssh-tunnel to RDS on AWS, postgres 10.6 usually. But I've had the same experience on other versions and locally too.

The problem usually isn't that it doesn't work ever, just that it can take a very long time, especially if the query is reading some crazy amount of data. I've always found pg_cancel_backend() to be almost instant though.

salmo5y ago

Sounds like it's ^C on a client that doesn't trap SIGTERM and cleanup. Probably something they're working on.

lmarcos5y ago· 3 in thread

All I want is to be able to use Postgres in production without the need of pgbouncer.

matsemann5y ago

Never had the use for it or even heard of it, guess it depends on usage patterns? I've mostly worked with longlived java servers, and there having an internal db pool has been standard since forever, so no need for another layer.

mixmastamyk5y ago

Care to elaborate? Having each tool handle its job sounds like a good strategy.

pgaddict5y ago

It's not clear to me if the OP want's to run without any connection pool (incl. a built-in one), or just without a separate one.

In an ideal world PostgreSQL would handle infinite number of connections without a connection pool. Unlikely in practicem though.

There are good practical reasons to actually limit the number of connections:

(a) CPU efficiency (optimal number of active connections is 1-2x number of cores)

(b) allows higher memory limits

(d) ... probably more

Some applications simply ignore this and expect rather high number of connections, with the assumption most of them will be idle. Sometimes the connections are opened/closed frequently, making it worse.

Eliminating the need for a connection pool in those cases would probably require significant changes to the architecture, so that e.g. forking a process is not needed.

But my guess is that's not going to happen. A more likely solution is having a built-in connection pool which is easier to configure / operate.

Separate connection pools (like pgbouncer) are unlikely to go away, though, because being able to run them on a separate machine is a big advantage.

andrewstuart5y ago· 2 in thread

It would be nice to hear how much of problem XID wraparound is in Postgres 14 - do the fixes below address it entirely or just make it less of a problem?

I see no mention of addressing transaction id wraparound, but these are in the release notes:

Cause vacuum operations to be aggressive if the table is near xid or multixact wraparound (Masahiko Sawada, Peter Geoghegan)

This is controlled by vacuum_failsafe_age and vacuum_multixact_failsafe_age.

Increase warning time and hard limit before transaction id and multi-transaction wraparound (Noah Misch)

This should reduce the possibility of failures that occur without having issued warnings about wraparound.

https://www.postgresql.org/docs/14/release-14.html

petergeoghegan5y ago

Co-author of that feature here.

Clearly it doesn't eliminate the possibility of wraparound failure entirely. Say for example you had a leaked replication slot that blocks cleanup by VACUUM for days or months. It'll also block freezing completely, and so a wraparound failure (where the system won't accept writes) becomes almost inevitable. This is a scenario where the failsafe mechanism won't make any difference at all, since it's just as inevitable (in the absence of DBA intervention).

A more interesting question is how much of a reduction in risk there is if you make certain modest assumptions about the running system, such as assuming that VACUUM can freeze the tuples that need to be frozen to avert wraparound. Then it becomes a question of VACUUM keeping up with the ongoing consumption of XIDs by the system -- the ability of VACUUM to freeze tuples and advance the relfrozenxid for the "oldest" table before XID consumption makes the relfrozenxid dangerously far in the past. It's very hard to model that and make any generalizations, but I believe in practice that the failsafe makes a huge difference, because it stops VACUUM from performing further index vacuuming.

In cases at real risk of wraparound failure, the risk tends to come from the variability in how long index vacuuming takes -- index vacuuming has a pretty non-linear cost, whereas all the other overheads are much more linear and therefore much more predictable. Having the ability to just drop those steps if and only if the situation visibly starts to get out of hand is therefore something I expect to be very useful in practice. Though it's hard to prove it.

Long term, the way to fix this is to come up with a design that doesn't need to freeze at all. But that's much harder.

andrewstuart5y ago

Very interesting thanks for the update - how great is the Internet to hear directly from the developer!

It's a pity this wasn't listed in the announcement as I think alot of people are interested in this issue.

>> Long term, the way to fix this is to come up with a design that doesn't need to freeze at all.

Do you know if anyone is turning their attention to this or is it not currently being tackled by anyone?

1 more reply

jabl5y ago· 2 in thread

Seems zheap didn't make it this time either?

pella5y ago

ZHEAP Status: https://cybertec-postgresql.github.io/zheap/

- 12-10-2020: "Most regression tests are passing, but write-speeds are still low."

- wiki: https://wiki.postgresql.org/wiki/Zheap

cett5y ago

I would love to see it delivered

hmmokidk5y ago· 1 in thread

Are we still going to need PgBouncer when there are a large number of connections?

jpgvm5y ago

For now yes. The idle connection changes help but it's still inefficient. I would like to see connection pooling functionality merged into core PG at some point. Eliminate the need for network hop/IPC and enable better back-pressure etc.

deedubaya5y ago· 1 in thread

It would be nice to not need pgbouncer

I_am_tiberius5y ago

Indeed! Postgres 14 improves scalability of concurrent connections but I doubt cloud db providers will adjust their max. connections limit.

gigatexal5y ago

From the article:

And 200+ other improvements in the Postgres 14 release!

These are just some of the many improvements in the new Postgres release. You can find more on what's new in the release notes, such as:

    The new predefined roles pg_read_all_data/pg_write_all_data give global read or write access
    Automatic cancellation of long-running queries if the client disconnects
    Vacuum now skips index vacuuming when the number of removable index entries is insignificant
    Per-index information is now included in autovacuum logging output
    Partitions can now be detached in a non-blocking manner with ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY

the killing of queries when the client disconnects is really nice imo -- the others are great too

bredren5y ago

If you’re interested in recent enthusiastic (nearly effusive) discussion of Postgres and more specifically it’s potential as a basis for a data warehouse, you might enjoy this episode of Data Engineering Podcast with Thomas Richter and Joshua Drake:

Episode website: https://www.dataengineeringpodcast.com/postgresql-data-wareh...

Direct: (apple) https://podcasts.apple.com/us/podcast/data-engineering-podca...

edoceo5y ago

Wow! Memory stats! Repeat query stats! The perfect database gets more perfecter! I'm looking forward to using PG for another 20 years.

dragonwriter5y ago

Lots of good ops-y stuff, and, with my dev hat on, multirange types are just a whole layer of awesome on top of the awesome that range types already were.

j / k navigate · click thread line to collapse

245 comments

126 comments · 17 top-level

matthewbauer5y ago· 40 in thread

Do any other entrenched software projects come to mind? The only thing comparable I can think of are Git and Linux.

jasonwatkinspdx5y ago

There's been a lot of exciting work in this area over the last decade or so. Andy Pavlo's classes are great surveys of the latest work: https://15721.courses.cs.cmu.edu/spring2020/

CosmosDB is an example of a relational (multi paradigm properly) database with a quite different architecture vs the classic design, that's moved into production status quite rapidly.

FaunaDB and CockroachDB are moving with solid momentum too.

oblio5y ago

Yeah, to list a bit:

- scaling is non-trivial (you can't just add a node and have PostgreSQL automagically Do The Right Thing™)

- you can only have so many connections open to the database, causing issues with things such as AWS Lambda

- I don't remember if this was changed, but I got the impression a while ago that having dynamic DB users was a bit cumbersome to set up (plug PostgreSQL to AD/LDAP)

2 more replies

exceptione5y ago

You must be kidding me with the CosmosDB mention. It doesn't even have foreign key constraints. I have to work with it and I have never seen such a feature-poor dbms before.

2 more replies

threeseed5y ago

There are also technologies like NVMe over Fabric/RDMA, eBPF, XDP, io_uring etc which are just starting to get traction and are game changers for performance. None of which are being used.

All of these require a different architecture so expect to see newer databases push things even further.

1 more reply

gogopuppygogo5y ago

Cockroach is the worst brand for a database ever.

Even Croach would be a massive branding improvement.

This is similar to how gimp is a terrible brand.

3 more replies

teej5y ago

Lecture 1 of that series is surprising and hilarious for a class about databases.

1 more reply

stickfigure5y ago

I'm an enormous fan of Postgres, it's my default go-to RDBMS. But the memory expense of connections is a huge issue and this article doesn't convince me that it's solved.

The machine being used for this benchmark has 96 vCPUs, 192G of RAM, and costs $3k/mo.

So Postgres has at least one very serious fault that makes room in the marketplace. The poor replication story is another.

pgaddict5y ago

There are two things to consider:

1) The fixed per-connection memory (tracking state, locks, ..., a couple kBs or so). You'll pay this even for unused connections.

2) Per-process memory (each connection is handled by a separate thread).

(a) the process isolation actually a nice thing from the developer perspective (less locking, fewer data races, ...)

(b) processes work quite fine for reasonable number of long-lived connections, and for connection pools address a lot of the other cases

(c) PostgreSQL supports a lot of platforms, some of which may not may not have very good multi-threading support (and supporting both architectures would be quite a burden)

But that's just my assessment, of course.

2 more replies

megous5y ago

1 more reply

btbuilder5y ago

I agree - the disparity between the cost of idle connections in Postgres vs MSSQL is hampering our ability to migrate.

2 more replies

foota5y ago

Out of curiosity, do you know what causes this?

1 more reply

ccleve5y ago

Say more about the "poor replication story". I thought replication was pretty good. What's wrong with it?

1 more reply

vosper5y ago

> Do any other entrenched software projects come to mind?

It’s pretty stable, too, once you’ve got the cluster configured. We don’t have outages due to problems with Elasticsearch.

FridgeSeal5y ago

I’m currently evaluating typesense vs ES for a fts project and typesense is winning so far by simply be “not painful” to deal with.

1 more reply

jeff-davis5y ago

I don't know about elasticsearch specifically, but I'm skeptical of special-purpose systems for databases.

They are great in some cases and terrible in others, and over time, use cases push database systems into their worst cases. Use cases rarely stay in the sweet spot of a special-purpose system.

That being said, if the integration is great, and/or the special system is a secondary one (fed from a general-purpose system), then it's often fine.

1 more reply

bradleyjg5y ago

It’s frustrating to need a run-time team for a piece of infrastructure, especially one sold as IaaS.

1 more reply

ranit5y ago

> Do any other entrenched software projects come to mind?

SQLite.

IshKebab5y ago

I'm pretty hopeful that DuckDB will replace some of the use of SQLite. SQLite is great but it sucks that it's entirely dynamically typed (the types specified for columns are completely ignored).

1 more reply

jeff-davis5y ago

I like to say that "Postgres is a great default". It's generally very good, and also very adaptable to special purposes, so it covers a wide range of use cases.

But saying "so much better" is too strong.

1 more reply

dimgl5y ago

Postgres is good, even great, but this is hyperbole. Postgres has its downsides, autovacuum being one of them.

petergeoghegan5y ago

Although the article doesn't mention it, index bloat will be far better controlled in Postgres 14:

https://www.postgresql.org/docs/devel/btree-implementation.h...

One benchmark involving a mix of queue-like inserts, updates, and deletes showed that it was practically 100% effective at controlling index bloat:

https://www.postgresql.org/message-id/CAGnEbogATZS1mWMVX8FzZ...

Full disclosure: I am the author of this feature.

3 more replies

sweeneyrod5y ago

I think many mercurial users would disagree with you about git.

jayd165y ago

Are we talking about market dominance, mind share or the idea that there's no real competition?

MySQL and Oracle exist. Mercurial and perforce exist. I'm not sure it's a terrible stretch to compare git and postures.

1 more reply

jjeaff5y ago

MySQL 8 is not that far behind in feature parity. And is ahead when it comes to scalability. So I don't see postgres as necessarily standing alone.

purerandomness5y ago

No DDL transactions, no materialized views, the list is endless.

There's almost no reason to pick MySQL for a new project.

2 more replies

paozac5y ago

MySQL's lack of DDL transactions is a serious shortcoming.

ezekiel685y ago

[0] https://www.techempower.com/benchmarks/

3 more replies

ksec5y ago

Are there any Roadmap for MySQL 9 ?

yakubin5y ago

Fortran for linear algebra software.

Excel for business spreadsheets.

Java for enterprise server software.

nsajko5y ago

> Fortran for linear algebra software.

Not an expert, but it is my understanding that Julia is becoming an ever more serious competitor day by day.

> Excel for business spreadsheets.

2 more replies

Supermancho5y ago

> Java for enterprise server software.

1 more reply

eterm5y ago

harikb5y ago

2 more replies

dilyevsky5y ago

Kubernetes when it comes to clustering.

fibers5y ago

i had to roll back to 9.6 on windows because \COPY is fundamentally broken for large cvs

CapriciousCptl5y ago

What's the issue? Just on Windows? Mac OS X with 13.2 has no issue for me with the 1.1gigabyte 20million record csv just imported last week, or some bigger ones I did a few months back.

1 more reply

jpeter5y ago

What exactly makes Postgres better than MySql? There seem to be certain design decisions like WAL or process per connection that cause problems at scale

https://eng.uber.com/postgres-to-mysql-migration/

Tostino5y ago

That article really isn't a good critique of Postgres.

autodeadmehaha5y ago

yjftsjthsd-h5y ago

? What's wrong with pg_upgrade?

e1g5y ago· 14 in thread

Another exciting feature in PG14 is the new JSONB syntax[0], which makes it easy to update deep JSON values -

  UPDATE table SET some_jsonb_column['person']['bio']['age'] = '99';

[0] https://erthalion.info/2021/03/03/subscripting/

GordonS5y ago

Gods, but this is fabulous!

JSONB capabilities in Postgres are amazing, but the syntax is really annoying - for example, I'm forever mixing up `->` and `->>`. This new syntax feels far more intuitive.

topicseed5y ago

Constantly have to google up the JSONB wuery syntax, it's just too confusing to me, although it is indeed powerful.

Glad to this this super intuitive and familiar syntax added. Will make writing these updates a lot lot lot easier. Not even close!

xfalcox5y ago

Wow is this for real? That is such a big quality of life change! Happy to see it!

megous5y ago

Not much different from some_jsonb#>>'{some,path}' and once you add the need to convert out of jsonb to text, you'll not be saving any characters either. At least for queries.

For updates, it looks nice I guess.

4 more replies

bionhoward5y ago

love it. time to bake ramda.sql

roenxi5y ago

Postgres is bowing to the inevitable, JSON support is too much in demand.

It isn't so out there to think that ongoing calls for JSON support will lead Postgres to re-implement tables in JSON. We've already got people trying to build indexes on fields inside a JSON field.

paulddraper5y ago

> Postgres is bowing to the inevitable

I think PostgreSQL has always been very pragmatic. It's supported JSON natively since 9.2 (Sep 2012).

> Databases are a bad place to be storing JSON

You're right that "mature" features and projects have a very good understand of the schema. But not everything is that.

Suppose I want to collect info from the Github API about a bunch of repos. I can just store the entire JSON response in a table and then query it at my leisure.

ako5y ago

1 more reply

golergka5y ago

> Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard.

That's why in 99% of cases, Postgresql uses jsonb as storage standard, which is binary and compressed.

> This is needless complexity engineered by people who insist on relearning schemas from scratch

tgv5y ago

toomanybeersies5y ago

JSON in Postgres is a bit like a nail gun. Used correctly, it's incredibly useful. But in inexperienced hands (and lacking good technical leadership), it's easy to shoot yourself in the thigh.

You don't even need JSONB to commit war crimes on a Postgres database. There's many things that Postgres can do, but probably shouldn't be done:

- Storing "foreign keys" in an array column, instead of using a join table

- Storing binary files as base64 encoded strings in text columns

- Using a table with `key` and `value` string columns instead of using redis

- Pub/sub using NOTIFY/LISTEN - Message queueing

- Other forms of IPC in general

- Storing executable code

- God tables

Even when trying to use Postgres appropriately, plenty of engineers don't get it right: unnecessary indices, missing indices, denormalised data, etc.

The problem isn't JSONB (or any other feature). It's bad engineering. Usually it's an incarnation of Maslow's Hammer: when all you have is a hammer, everything looks like a nail.

1 more reply

dragonwriter5y ago

> Databases are a bad place to be storing JSON, which is a good interface and a bad storage standard.

ilikepi5y ago

The native JSON data type was introduced with PG 9.2 in 2012.

1 more reply

Closi5y ago

The reason is with some projects/data it's hard to be explicit about the schema which is why NoSQL had it's popularity phase.

Waterluvian5y ago· 14 in thread

Tangential to this topic:

If I have a Django + PG query that takes 1 second and I want to deeply inspect the breakdown of that entire second, where might I begin reading to learn what tools to use and how?

emilsedgh5y ago

EXPLAIN (ANALYZE, BUFFERS) <YOUR QUERY>

Take the result of this and paste it into https://explain.depesz.com/

which will make it human readable.

Understanding this is sometimes very easy, but if you want to understand what they _really_ mean, you can read depesz.com

diminish5y ago

I use it frequently - but I wish there was a tool which went into the semantics somewhat.

2 more replies

jakebasile5y ago

Wow, how have I never heard of this tool?! Thanks a lot for the link!

1 more reply

purerandomness5y ago

I recommend the book "SQL Performance Explained" by Markus Winand: https://sql-performance-explained.com/

It covers all major databases and is a good start to dive into database interna and how to interpret output from query analyzers.

Other than that, I highly recommend joining the mailing list and IRC (#postgresql on libera.chat).

Lots of valuable tricks being shared there by people with decades of experience.

isatty5y ago

Did freenode get renamed?

1 more reply

fabian2k5y ago

etxm5y ago

I’d start w ‘EXPLAIN query’, if you arent familiar with the output there, you can put it on PEV and get a visualization.

https://tatiyants.com/pev/#/plans

snissn5y ago

agree!! this page is so helpful

nerdbaggy5y ago

Django has built in explain support which can guide you on the right track https://docs.djangoproject.com/en/3.2/ref/models/querysets/#...

bredren5y ago

Just in case someone’s reading this and isn’t also aware: Django Debug Toolbar offers somewhat interactive exploration of queries.

It can also be used with Django Rest Framework via the browsable api.

May be parent is looking for deeper insight than this but it is useful to do quick visual query inspection.

blondin5y ago

this.

django debug toolbar (or similar) should be the first thing you go to because these tools understand the django ORM well.

the other thing that comes to mind is enabling query timing in your django shell. i believe you might need an extension for this.

then you can look at the postgres itself. but i would keep it at the django layer at first because it might reveal something about the ORM.

Izkata5y ago

If you only have the Django queryset and not the SQL, you can generate pseudo-sql using "print(queryset.query)".

epberry5y ago

This will get you started but is by no means a full guide on query optimization, https://arctype.com/blog/postgresql-query-plan-anatomy/. There's also a fair number of django posts on this blog.

cosmosgenius5y ago

All django query have a .explain with it. Which is similar to runnning an explain in the DB, but less detailed.

wiradikusuma5y ago· 12 in thread

keeperofdakeys5y ago

If you pick one, and you don't like it there are plenty of tools to change between them. If you're curious you could even deploy both of them.

ComputerGuru5y ago

andruby5y ago

[0] https://www.postgresql.org/docs/current/pgupgrade.html

2 more replies

ASalazarMX5y ago

agustif5y ago

Are you going to operate it our just rent out some cloud service?

Postgres by itself doesn't have a great horitzontal scaling strategy as of now I think. You need Citus or somt like that on top, maybe your friend was referencing that?

CapriciousCptl5y ago

You can fiddle with the autovacuum daemon[1,2] but we've never really had to. These days we just run AWS RDS when it counts or a dedicated VPS when it doesn't and things go fine--

[1,2] https://www.postgresql.org/docs/13/routine-vacuuming.html https://www.postgresql.org/docs/current/planner-stats.html

The main issue we get is the 1 connection = 1 process issue although there are ways to mitigate that (namely pgbouncer).

masklinn5y ago

I don’t know if there are distros which tune it, but the default configuration is usually… conservative.

offtop55y ago

I wouldn't imagine postgres is really much harder than MySQL.

Both are a degree more difficult than NoSQL. The main issue is maintaining schema's

yannoninator5y ago

perhaps your DBA friend was operating PG themselves?

nowadays postgres in the cloud does all of this for you.

sigzero5y ago

MySQL is for that aren't really DBAs and don't want to be (this doesn't mean DBAs don't use it). It's a lot easier to manage.

unnouinceput5y ago

Your DBA friend is stuck in 2000's. Let dinosaurs die and you go with PGSQL because is superior to MySQL on everything.

And don't take my word for it, see for yourself here:

https://en.wikipedia.org/wiki/Comparison_of_relational_datab...

And MySQL is an Oracle product these days, go with MariaDB instead as this one is a MySQL fork made by the original papa of MySQL.

tfigment5y ago

Lacks first class temporal tables. Maybe not important to you and not on that list so do we dismiss that.

eikenberry5y ago· 7 in thread

Any progress on high availability deployments yet? Or does it still rely on problematic, 3rd party tools?

porsager5y ago

threeseed5y ago

If you want HA use AWS RDS, Azure Citus, GCP Cloud SQL.

Otherwise use MySQL, Oracle, MongoDB, Cassandra etc if you want to run it on your own.

Any other database that invested in a native and supported HA/clustering implementation.

tluyben25y ago

Cockroachdb or Yugabyte work well for some cases you might use postgres for.

edoceo5y ago

From the old days it's way better. Both Logical and streaming replication is only a few lines, few commands kind of thing.

Logical for streaming to read only replicas and streaming for fail-over. My client-app still needs to know try-A then try-B (via DNS or config)

latch5y ago

But there's so much more to it than this, e.g. upgrading, failing over, point in time recovery, monitoring.

I manage both a cockroachdb cluster and a few PG setups. Out postgres' have streaming replication to a standby with barman running on the standby. They are night and day.

Sure 2021 PG is way better than 2010 PG. But relative to available options, it's much worse.

rusbus5y ago

I found running a 6-node Patroni cluster on Kubernetes to be a surprisingly pain-free experience a couple of years ago

tpetry5y ago

rargulati5y ago· 6 in thread

What's going to be the vitess of Postgres? Seems to be the "last" missing piece? Or is that not a focus and fit for PG?

yed5y ago

That would be Citus: https://www.citusdata.com/

threeseed5y ago

Which is now owned by Microsoft so except to see enterprise support disappear.

Instead you are likely to be forced to use a cloud hosted PostgreSQL instance in order to get HA/clustering.

jpgvm5y ago

Vitess for PostgreSQL will probably just be... Vitess.

The work to port Vitess to PostgreSQL is quite substantial however. Here is a ticket tracking the required tasks at a high level: https://github.com/vitessio/vitess/issues/7084

qaq5y ago

I think something like YugabyteDB

ksec5y ago

I think vitess has some long term goal to also support Postgre.

osser5y ago

arunitc5y ago· 4 in thread

davidrowley5y ago

Is that faster now?

aidos5y ago

It’s a little hard to parse that on mobile but it looks like you’re doing correlated subqueries against the dB schema for each row in the table you’re deleting from.

As others have said, explain analyze will show you what’s going on. I’m fairly sure this query would be fixed by flipping and / or adding an index. 5k records is nothing to pg.

tpetry5y ago

Can you share the explain analyze output of the query?

croh5y ago

Have you checked performance using different algorithms like hash-join, merge-join and nested-loop ?

efxhoy5y ago· 3 in thread

> Automatic cancellation of long-running queries if the client disconnects

Sweet! I often screw up a query and need to cancel it with

  pg_cancel_backend(pid)

because Ctrl-C rarely works. With this I can just ragequit and reconnect. Sweet!

znep5y ago

efxhoy5y ago

It's psql through an ssh-tunnel to RDS on AWS, postgres 10.6 usually. But I've had the same experience on other versions and locally too.

salmo5y ago

Sounds like it's ^C on a client that doesn't trap SIGTERM and cleanup. Probably something they're working on.

lmarcos5y ago· 3 in thread

All I want is to be able to use Postgres in production without the need of pgbouncer.

matsemann5y ago

mixmastamyk5y ago

Care to elaborate? Having each tool handle its job sounds like a good strategy.

pgaddict5y ago

It's not clear to me if the OP want's to run without any connection pool (incl. a built-in one), or just without a separate one.

In an ideal world PostgreSQL would handle infinite number of connections without a connection pool. Unlikely in practicem though.

There are good practical reasons to actually limit the number of connections:

(a) CPU efficiency (optimal number of active connections is 1-2x number of cores)

(b) allows higher memory limits

(d) ... probably more

Eliminating the need for a connection pool in those cases would probably require significant changes to the architecture, so that e.g. forking a process is not needed.

But my guess is that's not going to happen. A more likely solution is having a built-in connection pool which is easier to configure / operate.

Separate connection pools (like pgbouncer) are unlikely to go away, though, because being able to run them on a separate machine is a big advantage.

andrewstuart5y ago· 2 in thread

It would be nice to hear how much of problem XID wraparound is in Postgres 14 - do the fixes below address it entirely or just make it less of a problem?

I see no mention of addressing transaction id wraparound, but these are in the release notes:

Cause vacuum operations to be aggressive if the table is near xid or multixact wraparound (Masahiko Sawada, Peter Geoghegan)

This is controlled by vacuum_failsafe_age and vacuum_multixact_failsafe_age.

Increase warning time and hard limit before transaction id and multi-transaction wraparound (Noah Misch)

This should reduce the possibility of failures that occur without having issued warnings about wraparound.

https://www.postgresql.org/docs/14/release-14.html

petergeoghegan5y ago

Co-author of that feature here.

Long term, the way to fix this is to come up with a design that doesn't need to freeze at all. But that's much harder.

andrewstuart5y ago

Very interesting thanks for the update - how great is the Internet to hear directly from the developer!

It's a pity this wasn't listed in the announcement as I think alot of people are interested in this issue.

>> Long term, the way to fix this is to come up with a design that doesn't need to freeze at all.

Do you know if anyone is turning their attention to this or is it not currently being tackled by anyone?

1 more reply

jabl5y ago· 2 in thread

Seems zheap didn't make it this time either?

pella5y ago

ZHEAP Status: https://cybertec-postgresql.github.io/zheap/

- 12-10-2020: "Most regression tests are passing, but write-speeds are still low."

- wiki: https://wiki.postgresql.org/wiki/Zheap

cett5y ago

I would love to see it delivered

hmmokidk5y ago· 1 in thread

Are we still going to need PgBouncer when there are a large number of connections?

jpgvm5y ago

deedubaya5y ago· 1 in thread

It would be nice to not need pgbouncer

I_am_tiberius5y ago

Indeed! Postgres 14 improves scalability of concurrent connections but I doubt cloud db providers will adjust their max. connections limit.

gigatexal5y ago

From the article:

And 200+ other improvements in the Postgres 14 release!

These are just some of the many improvements in the new Postgres release. You can find more on what's new in the release notes, such as:

    The new predefined roles pg_read_all_data/pg_write_all_data give global read or write access
    Automatic cancellation of long-running queries if the client disconnects
    Vacuum now skips index vacuuming when the number of removable index entries is insignificant
    Per-index information is now included in autovacuum logging output
    Partitions can now be detached in a non-blocking manner with ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY

the killing of queries when the client disconnects is really nice imo -- the others are great too

bredren5y ago

Episode website: https://www.dataengineeringpodcast.com/postgresql-data-wareh...

Direct: (apple) https://podcasts.apple.com/us/podcast/data-engineering-podca...

edoceo5y ago

Wow! Memory stats! Repeat query stats! The perfect database gets more perfecter! I'm looking forward to using PG for another 20 years.

dragonwriter5y ago

Lots of good ops-y stuff, and, with my dev hat on, multirange types are just a whole layer of awesome on top of the awesome that range types already were.

j / k navigate · click thread line to collapse