It's also worth calling out the first diagram shows dependencies between features for Ubicloud's managed Postgres. AWS, Azure, and GCP's managed Postgres service would have a different diagram. That's because we at Ubicloud treat write-ahead logs (WAL) as a first class citizen.
You can overcome this problem by setting the archive_timeout and forcing Postgres to generate a new WAL file every minute. With this configuration, Postgres would create a new WAL file when it hits the 1-minute or 16 MB threshold, whichever comes first.
The second issue with backup/restore is “no activity”. In this case, PostgreSQL wouldn’t create a new file even if the archive_timeout is set. As a solution, you can generate artificial write activity by calling pg_current_xact_id().”
Can you explain why to create a WAL file even though there is no activity?
Then, as @pintxo, @bjornsing, and @Doxin asked below; Why not let users pick a time and then replay the closest backup? Why create empty backups? This is a valid question.
The answer is that without an empty backup, it's not possible to confirm there was no write activity. Maybe some write activity occurred but you fail to archive the WAL file. You need at least one WAL file (empty or not) with a timestamp equal to or more recent than the target restore time to ensure there were no write activity.
I didn't explain this in detail in the blog post for the sake of brevity, but you don't need to create an empty WAL file for each minute with no write activity. For instance, if there was no write activity for 10 minutes and then a WAL file is generated, PostgreSQL can deduce there was no write activity during those 10 minutes and is able restore to any minute within that period. However, if the user wants to restore to 5 minutes ago and there has been no write activity for the past 10 minutes, PostgreSQL doesn't have a WAL file to confirm lack of write activity. In this case, you need to call pg_current_xact_id() to generate a WAL file. So the trick is, when you get a restore request, call pg_current_xact_id() only once, just in case if there were no write activity.
This is a good example of difference between running PostgreSQL for yourself vs. others. When running PostgreSQL for yourself, such edge cases is not a concern. However, when managing it for others, an edge case like this could prevent a restore and create an operational burden. Automating this process is required especially if you are managing lots of PostgreSQL databases.
I'm asking because logical replication seems to become more and more common as a solution to automatically replicate data to external systems (another Postgres instance, Kafka, data warehousing, or offline sync systems in my case), but many cloud providers appear to not support it. Many others do (including AWS, GCP), so I'd also be interested in how they handle high availability.
There are many reasons that cloud providers don't want to support logical replication;
- It requires giving superuser access to user. Many cloud providers don't want to give that level of privilege. Some cloud providers fork PostgreSQL or write custom extensions to allow managing replication slots without requiring superuser access. However, doing it securely is very difficult. You suddenly open up a new attack vector for various privilege escalation vulnerabilities.
- If user creates a replication slot, but does not consume the changes, it can quickly fill up the disk. I dealt many different failure modes of PostgreSQL, and I can confidently say that disk full cases one of the most problematic/annoying ones to recover from.
- It requires careful management of replication slots in case of fail over. There are extensions or 3rd party tools helping with this though.
So, some cloud providers don't support logical replication and some support it weakly (i.e. don't cover all edge cases).
Thankfully there are some improvements are being done in PostgreSQL core that simplifies failover of logical replication slot (check out this for more information https://www.postgresql.org/docs/17/logical-replication-failo...), but it is still too early.
* You describe them as wanting to deploy the database in container: why would anyone do that (unless for throwaway testing or such)?
* The certificate issue seems also very specific to the case when something needs to go over public Internet to some anonymous users... Most database deployments I've seen in my life fall into one of these categories: database server for a Web server, which talk in a private local network, or database backing some kind of management application, where, again the communication between the management application and the database happen without looping in the public Internet.
* Unwilling to wait 10 minutes to deploy a database. Just how often do they need new databases? Btw. I'm not sure any of the public clouds have any ETAs on VM creation, but from my practice, Azure can easily take more than 10 minutes to bring up a VM. With some flavors it can take a very long time.
The only profile I can think of is the kind of user who doesn't care about what they store in the database, in the sense that they are going to do some quick work and throw the data away (eg. testing, or maybe BI on a slice of data). But then why would these people need HA or backups, let alone version updates?
* We saw many deployments where communication between web server and database were going through public internet. It doesn't need to be for anonymous users. It is also even somewhat common where web server and database are managed by different SaaS providers, so they have to (in most cases) communicate through public network.
* We (and all cloud providers) are trying to reduce overall provisioning time, mostly to reduce the friction for first time users. There is no SLA but for common instance types, it would be unusual to wait for more than 1 minute at AWS, Azure or Ubicloud for VM provisionings.
Maybe you and I just experience different parts of the big computing ecosystem, hence what is "usual" for each of us is different. Out of curiosity, are you coming from enterprise background?
For the first one, the answer would be no, because I don't have any expertise on MySQL. I only used it years ago on my hobby projects. I'm not the best person to write something like this for MySQL.
For the second one, the answer would be maybe, but not any time soon. We have list of services that we want to build at Ubicloud and currently services like K8s have more priority.
What is MMW on the flowchart?
I think you may benefit from researching the ecosystem some more.
If you have concrete knowledge, please share it and don't be cryptic!
I certainly did not know that.
Here the project homepage with list of supported software: https://linux-system-roles.github.io/#toggleMenu
"The problem with the server certificate is that someone needs to sign it. Usually you would want that certificate to be signed by someone who is trusted globally like DigiCert. But that means you need to send an external request; and the provider at some point (usually in minutes but sometimes it can be hours) signs your certificate and returns it back to you. This time lag is unfortunately not acceptable for users. So most of the time you would sign the certificate yourself. This means you need to create a certificate authority (CA) and share it with the users so they can validate the certificate chain. Ideally, you would also create a different CA for each database."
Couldn't you automate this with Let's Encrypt instead?
Thanks!
Self signing probably causes quite a few other issues, even though you have more control of the process, doesn't it?
Thanks!
DigitalOcean uses 5 minutes as archive_timeout, which is also a reasonable value. In our experience, we saw that most of our customers prefer less exposure and we settled on 1 minute as archive_timeout value.
Roughly archive_timeout defines your RPO(recovery point objective).
I would think you'd want at most one certificate authority per customer rather than per database. Why am I wrong?
It is likely that we would add a organization-like entity in the future to our data model and at that time sharing certificate authority would make more sense.