Transaction Isolation in Postgres (opens in new tab)

(thenile.dev)

160 pointsjerrinot2y ago25 comments

25 comments

22 comments · 6 top-level

nuttingd2y ago· 8 in thread

One caveat to serializable transactions in Postgres is that ALL concurrent transactions must be running with the SERIALIZABLE isolation level to protect against serialization anomalies.

This is a bit jarring if you come from MSSQL, which implements the SERIALIZABLE isolation level using locks. In MSSQL, you can rest assured that a serializable transaction will not be affected by changes from other concurrent transactions, regardless of their isolation level.

In Postgres, you may have a set of transactions all participating in SERIALIZABLE isolation today, but tomorrow someone adds another script without the SERIALIZABLE isolation level, and now your protected paths are no longer isolated.

magicalhippo2y ago

We use Sybase SQLAnywhere at work, which also implements SERIALIZABLE using locks. Naive me thought that meant a lock on the table, but no, it locks all the rows... Not great for a table with many rows!

We were essentially trying to avoid inserting the same value twice, so we ditched SERIALIZABLE and instead added a unique index along with a retrying loop on the client side.

brasetvik2y ago

Or from the other perspective of the trade-off: One caveat with MSSQL is that ALL concurrent transactions must pay the overhead if _some_ transactions need serializable guarantees?

vore2y ago

Only if they touch the same data. If they are touching disjoint sets of data then there is no overhead to be paid by non-SERIALIZABLE transactions.

bob10292y ago

There has been some recent improvement to locking behavior:

https://learn.microsoft.com/en-us/sql/relational-databases/p...

Fire-Dragon-DoL2y ago

Oh that's super nasty, is it mentioned somewhere in the doc?

Is it the same for repeatable read?

nuttingd2y ago

I have read the docs plenty of times, but it never stuck for me until I read the (free!) PostgreSQL 14 Internals ebook: https://postgrespro.com/community/books/internals

Quoted from Page 70:

If you use the Serializable level, it must be observed by all transactions of the application. When combined with other levels, Serializable behaves as Repeatable Read without any notice. So if you decide to use the Serializable level, it makes read sense to modify the default_transaction_isolation parameter value accordingly -- even though someone can still overwrite it by explicitly setting a different level.

I had a real "WTF?" moment when I read this the first time.

1 more reply

gwen-shapira2y ago

It is mentioned in the doc, but can be easy to mis-understand:

"If a pattern of reads and writes among concurrent serializable transactions would create a situation which could not have occurred for any serial (one-at-a-time) execution of those transactions, one of them will be rolled back with a serialization_failure error."

Note that it says nothing about the non-serializable transactions.

https://www.postgresql.org/docs/current/sql-set-transaction....

1 more reply

TomaszZielinski2y ago

This is pretty intuitive when you think about predicate locks that Postgres uses to detect conflicts.

If you have one SERIALIZABLE transaction that sets some locks, and one non-SERIALIZABLE that doesn't, then they can't "see" each other "by definition".

But your point stands--there could be some kind of "warning flag" somewhere, that would alert if SERIALIZABLE transactions overlap with non-SERIALIZABLE ones. Or maybe there _already_ is something like that??

watters2y ago· 3 in thread

> For reasons that should be obvious to anyone with a bank account, you really really want both updates to happen, or neither. This is what atomicity guarantees - that the entire transaction will either succeed or fail as a single unit.

So, I understand why this example feels particularly illustrative of the value of transactions, many-if-not-most financial "transactions" can't practically rely on this kind of atomicity for the kind of financial operation depicted.

While it may seem like a small thing, I think authors would do everyone a favor to stop using the "banking transactions, obvs" example.

TheNewsIsHere2y ago

I suppose this is a good example if your reader knows how banking systems work.

A better direct example in the same line of reasoning would be double-entry accounting where you would want both the credit and debit entry to either fail or succeed.

Most people probably don’t know that their bank account _is_ a double-entry account to their banking institution.

I can’t noodle a way to make the banking example more intuitive for an audience absent explaining how double-entry accounting works and that banks mostly obscure that from the customer. That’s not really knowledge you can assume from a software developer or sysadmin.

lmm2y ago

In my experience most things you want to do turn out to be impossible to achieve with RDBMS-level transactions, and you end up having to implement the behaviour that you need "by hand" with the database's transaction support mostly getting in your way. So in a subtle way banking transactions are actually a pretty good example.

banq2y ago

transaction is belong to business logic, please use DDD

Exuma2y ago· 3 in thread

This is literally one of those topics I have to come back and read time and time and time again every time I need it like 2 times a year. Maybe this article will finally make it stick.

gwen-shapira2y ago

Possibly. I tried hard to have some memorable examples.

But I'll be honest: Every time I got review comments (or re-reviewed myself), it took a bit of time for my brain to warm up again into "transaction mode".

The big lesson may that concurrent transactions are pretty hard to reason about without external assistant like diagrams or test scenarios. I really like the system Postgres uses for transaction testing (AKA - deterministic simulation testing). Create scenario that match your business logic and then run them serially but with different ordering of statements and make sure the results are as you expect.

rabee32y ago

this! never sticks for long, and I like the way this article flows in explaining isolation levels. definitely bookmarking it to get back to it later when needed.

TomaszZielinski2y ago

I've been dealing with transactions regularly in the past few years, but not constantly, as things work correctly most of the time. And I still need to refresh my memory pretty much every time I go back to revisit some transactional code, or to answer any non-trivial questions about it.

So I guess it's just the way it is :)

masfuerte2y ago· 2 in thread

What is "the pre"?

itunpredictable2y ago

I came here to ask this

gwen-shapira2y ago

It should read "code" instead of "pre".

I'm pushing a fix right now.

ikhare2y ago

When I first learned about isolation levels in databases I was shocked that databases could “lie” to me. I think like most devs focused on the product end I just expected databases to be a magical black box that worked perfectly. Which I assumed was just the strictest definition of serializability without really thinking about it.

After watching some of Andy Pavlo’s lectures[1] it all just dawned on me: Databases are just like any other piece of code you write and have to think about all the tradeoffs with algorithms and book keeping to keep things efficient and providing the guarantees you want.

I highly recommend that lecture series.

Shameless plug: the reason I watched those lectures was to understand the internals of DBs better because I started working at Convex. Where we try to make sure things like this is something an app developer doesn’t have to worry about. Though we do mention it in our docs[2] for the curious.

[1] https://www.youtube.com/watch?v=LWS8LEQAUVc&list=PLSE8ODhjZX... [2] https://docs.convex.dev/database/advanced/occ

PeterCorless2y ago

Good articles on the difference between linearizability and serializability. They are not the same thing.

https://accelazh.github.io/storage/Linearizability-Vs-Serial...

https://ajaygupta-spark.medium.com/linearizability-and-vs-se...

j / k navigate · click thread line to collapse

25 comments

22 comments · 6 top-level

nuttingd2y ago· 8 in thread

One caveat to serializable transactions in Postgres is that ALL concurrent transactions must be running with the SERIALIZABLE isolation level to protect against serialization anomalies.

magicalhippo2y ago

We were essentially trying to avoid inserting the same value twice, so we ditched SERIALIZABLE and instead added a unique index along with a retrying loop on the client side.

brasetvik2y ago

Or from the other perspective of the trade-off: One caveat with MSSQL is that ALL concurrent transactions must pay the overhead if _some_ transactions need serializable guarantees?

vore2y ago

Only if they touch the same data. If they are touching disjoint sets of data then there is no overhead to be paid by non-SERIALIZABLE transactions.

bob10292y ago

There has been some recent improvement to locking behavior:

https://learn.microsoft.com/en-us/sql/relational-databases/p...

Fire-Dragon-DoL2y ago

Oh that's super nasty, is it mentioned somewhere in the doc?

Is it the same for repeatable read?

nuttingd2y ago

I have read the docs plenty of times, but it never stuck for me until I read the (free!) PostgreSQL 14 Internals ebook: https://postgrespro.com/community/books/internals

Quoted from Page 70:

I had a real "WTF?" moment when I read this the first time.

1 more reply

gwen-shapira2y ago

It is mentioned in the doc, but can be easy to mis-understand:

Note that it says nothing about the non-serializable transactions.

https://www.postgresql.org/docs/current/sql-set-transaction....

1 more reply

TomaszZielinski2y ago

This is pretty intuitive when you think about predicate locks that Postgres uses to detect conflicts.

If you have one SERIALIZABLE transaction that sets some locks, and one non-SERIALIZABLE that doesn't, then they can't "see" each other "by definition".

watters2y ago· 3 in thread

While it may seem like a small thing, I think authors would do everyone a favor to stop using the "banking transactions, obvs" example.

TheNewsIsHere2y ago

I suppose this is a good example if your reader knows how banking systems work.

A better direct example in the same line of reasoning would be double-entry accounting where you would want both the credit and debit entry to either fail or succeed.

Most people probably don’t know that their bank account _is_ a double-entry account to their banking institution.

lmm2y ago

banq2y ago

transaction is belong to business logic, please use DDD

Exuma2y ago· 3 in thread

This is literally one of those topics I have to come back and read time and time and time again every time I need it like 2 times a year. Maybe this article will finally make it stick.

gwen-shapira2y ago

Possibly. I tried hard to have some memorable examples.

But I'll be honest: Every time I got review comments (or re-reviewed myself), it took a bit of time for my brain to warm up again into "transaction mode".

rabee32y ago

this! never sticks for long, and I like the way this article flows in explaining isolation levels. definitely bookmarking it to get back to it later when needed.

TomaszZielinski2y ago

So I guess it's just the way it is :)

masfuerte2y ago· 2 in thread

What is "the pre"?

itunpredictable2y ago

I came here to ask this

gwen-shapira2y ago

It should read "code" instead of "pre".

I'm pushing a fix right now.

ikhare2y ago

I highly recommend that lecture series.

[1] https://www.youtube.com/watch?v=LWS8LEQAUVc&list=PLSE8ODhjZX... [2] https://docs.convex.dev/database/advanced/occ

PeterCorless2y ago

Good articles on the difference between linearizability and serializability. They are not the same thing.

https://accelazh.github.io/storage/Linearizability-Vs-Serial...

https://ajaygupta-spark.medium.com/linearizability-and-vs-se...

j / k navigate · click thread line to collapse