Also just for what it's worth: we agree with 'foobarbazetc. There's a section about that in the post. If you're saying there are important classes of applications this doesn't work well for, that's true.
I just don't see how something like this actually works in a way where you can reason about transaction ordering without global serialisation of requests against the writable database, which I presume isn't happening behind the scenes?
Basically, this would seem to work great if every request did INSERTs only and you based no logic to run INSERTs on anything you SELECT'ed from the read only replicas.
But you could have situations where, e.g., you DELETE or UPDATE something in a request to one region, and it goes to replay that against the writable region, and in "replay gap" another request modifies the same rows or objects, and based on various factors such as latency etc, a DELETE ... WHERE or UPDATE ... WHERE clause might no longer hold. Or you UPDATE the wrong objects, or DELETE the wrong data, etc.
I did read the post, but I guess I need to re-read it. I have written a globally distributed database before so I'm always intrigued by how these work. :)
What we're doing works identically to a vanilla HTTP based app. Requests that modify the DB always run against the primary database. Requests that perform reads _then_ modify the DB always run against the primary database.
HTTP services all have an underlying eventual consistency problem. If you are viewing a page with a database ID on it, then click delete for that ID, the record could be gone before the time the request hits the database (because someone else might've clicked while you were reading).
Does that help? I think we didn't describe this well enough, it's way simpler under the covers than you might expect.
If the readonlies might do something else stateful and not unwind it on postgres write error, then this approach won't work. (But wouldn't they be buggy anyway? Postgres writes aren't guaranteed not to raise errors.)
The idea is basically "there is an useful concept of 'group of queries' which is invisible to DB" :)
It's all a game of latencies and eventual consistency, so what Fly.io is just treating writes as an HTTP request that has to travel all the way to the primary original, triggered by error an the regional location rather than your application actively doing it.