undefined | Better HN

0 pointstptacek4y ago0 comments

I mean: just to be clear: you can't write to a read replica. There's no way to introduce a conflict that way.

Also just for what it's worth: we agree with 'foobarbazetc. There's a section about that in the post. If you're saying there are important classes of applications this doesn't work well for, that's true.

0 comments

6 comments · 1 top-level

foobarbazetc4y ago· 5 in thread

Just to be clear, I'm not trying to say it's bad or whatever. It's super cool! I'm a big fan of fly.io.

I just don't see how something like this actually works in a way where you can reason about transaction ordering without global serialisation of requests against the writable database, which I presume isn't happening behind the scenes?

Basically, this would seem to work great if every request did INSERTs only and you based no logic to run INSERTs on anything you SELECT'ed from the read only replicas.

But you could have situations where, e.g., you DELETE or UPDATE something in a request to one region, and it goes to replay that against the writable region, and in "replay gap" another request modifies the same rows or objects, and based on various factors such as latency etc, a DELETE ... WHERE or UPDATE ... WHERE clause might no longer hold. Or you UPDATE the wrong objects, or DELETE the wrong data, etc.

I did read the post, but I guess I need to re-read it. I have written a globally distributed database before so I'm always intrigued by how these work. :)

mrkurt4y ago

INSERT, DELETE, UPDATE will all fail in the readonly regions. This is normal for Postgres read replicas, we're not doing anything special here.

What we're doing works identically to a vanilla HTTP based app. Requests that modify the DB always run against the primary database. Requests that perform reads _then_ modify the DB always run against the primary database.

HTTP services all have an underlying eventual consistency problem. If you are viewing a page with a database ID on it, then click delete for that ID, the record could be gone before the time the request hits the database (because someone else might've clicked while you were reading).

Does that help? I think we didn't describe this well enough, it's way simpler under the covers than you might expect.

nightpool4y ago

I don't think you're quite understanding how the replay mechanism works. The original HTTP request fails because it can't DELETE or UPDATE something, so the entire request gets re-routed and then handled by the app again, from the beginning. So your SELECTs (as long as they're in the same HTTP request) will always get run against the writable leader.

JackC4y ago

Think of it like, if your read-only regions don't have anything else stateful they're able to do before the postgres write fails, then their existence is indistinguishable from network latency. An app that doesn't break because of latency won't break because of this.

If the readonlies might do something else stateful and not unwind it on postgres write error, then this approach won't work. (But wouldn't they be buggy anyway? Postgres writes aren't guaranteed not to raise errors.)

rfoo4y ago

That's because the database fail with no changes made, and they retry the whole HTTP request with all business logic in the application re-run against primary instance. So, nothing to do with database-level stuff.

The idea is basically "there is an useful concept of 'group of queries' which is invisible to DB" :)

manigandham4y ago

That's no different than just having a single region serving your application with users that are different distances and latencies away. The one farther away can issue an UPDATE/DELETE at the same time as the closer user, and the closer one would win.

It's all a game of latencies and eventual consistency, so what Fly.io is just treating writes as an HTTP request that has to travel all the way to the primary original, triggered by error an the regional location rather than your application actively doing it.

j / k navigate · click thread line to collapse