Difference between running Postgres for yourself and for others (opens in new tab)

(ubicloud.com)

188 pointspwmtr1y ago68 comments

68 comments

39 comments · 7 top-level

pwmtrOP1y ago· 20 in thread

Hey, author for the blog post is here. If you have any questions or comments, please let me know!

It's also worth calling out the first diagram shows dependencies between features for Ubicloud's managed Postgres. AWS, Azure, and GCP's managed Postgres service would have a different diagram. That's because we at Ubicloud treat write-ahead logs (WAL) as a first class citizen.

rembicilious1y ago

From the article: “If the user isn’t actively working on the database, PostgreSQL won’t generate a WAL file each minute. WAL files are by default 16 MB and PostgreSQL waits for the 16 MB to fill up. So, your restore granularity could be much longer, and you may not be able to restore to a particular minute in time.

You can overcome this problem by setting the archive_timeout and forcing Postgres to generate a new WAL file every minute. With this configuration, Postgres would create a new WAL file when it hits the 1-minute or 16 MB threshold, whichever comes first.

The second issue with backup/restore is “no activity”. In this case, PostgreSQL wouldn’t create a new file even if the archive_timeout is set. As a solution, you can generate artificial write activity by calling pg_current_xact_id().”

Can you explain why to create a WAL file even though there is no activity?

pwmtrOP1y ago

We want to ensure that the users are able to restore to any minute in the past.

Then, as @pintxo, @bjornsing, and @Doxin asked below; Why not let users pick a time and then replay the closest backup? Why create empty backups? This is a valid question.

The answer is that without an empty backup, it's not possible to confirm there was no write activity. Maybe some write activity occurred but you fail to archive the WAL file. You need at least one WAL file (empty or not) with a timestamp equal to or more recent than the target restore time to ensure there were no write activity.

I didn't explain this in detail in the blog post for the sake of brevity, but you don't need to create an empty WAL file for each minute with no write activity. For instance, if there was no write activity for 10 minutes and then a WAL file is generated, PostgreSQL can deduce there was no write activity during those 10 minutes and is able restore to any minute within that period. However, if the user wants to restore to 5 minutes ago and there has been no write activity for the past 10 minutes, PostgreSQL doesn't have a WAL file to confirm lack of write activity. In this case, you need to call pg_current_xact_id() to generate a WAL file. So the trick is, when you get a restore request, call pg_current_xact_id() only once, just in case if there were no write activity.

This is a good example of difference between running PostgreSQL for yourself vs. others. When running PostgreSQL for yourself, such edge cases is not a concern. However, when managing it for others, an edge case like this could prevent a restore and create an operational burden. Automating this process is required especially if you are managing lots of PostgreSQL databases.