Do we fear the serializable isolation level more than we fear subtle bugs (2024) (opens in new tab)

(blog.ydb.tech)

88 pointsb-man23d ago61 comments

61 comments

36 comments · 8 top-level

lukas22119d ago· 9 in thread

I would argue that not using serialization isolation level by default is like not using a memory safe programming language by default.

Sure, sometimes it's too slow, but it should be the default.

Very few people can write correct database code at the other serialization levels. Most think they can, but it's harder than correct multi-threading, because databases do weird unintuitive things for performance.

zadikian19d ago

In my experience, the performance hit is so bad that it's not feasible to use that way. It's also not strictly safer behavior-wise because retries can trip people up.

jiggawatts19d ago

I recently encountered a query that deadlocked on itself because it used a parallel execution plan and updated multiple indexes in a manner that the different threads could conflict with each other.

Naively one would expect that no individual UPDATE or INSERT statement could deadlock in isolation… but there you go.

If that is possible, the possibilities across multiple concurrent data mutating queries are beyond human comprehension!

Serializable should absolutely be the default!

Similarly, all columns should be automatically indexed to at least some degree, like Postgres BRIN indexes at a minimum.

Time and experience have shown that the vast majority of developers are pathologically unable to properly define all required indexes ahead of time.

hilariously19d ago

I've definitely experienced the opposite, where indexes run rampant and are NN times the size of data (for the entire database!)

That being said also "required indexes ahead of time" is impossible - because query patterns change and things get released unless you mean something like "the required indexes for our obvious query patterns we just freaking released.

I dont think most columns need to be indexed because that is mostly nonsense, most columns never have a where clause applied to them or are sorted on themselves, so an index provides ??? value.

63stack19d ago

You are supposed to define indexes based on how you query the data, not ahead of time.

2 more replies

RaftPeople18d ago

> I recently encountered a query that deadlocked on itself because it used a parallel execution plan and updated multiple indexes in a manner that the different threads could conflict with each other.

MS SQL?

We've had to set MAXDOP 1 on some specific queries for a long time.

mike_hearn18d ago

Worth noting that Oracle has an auto-indexing feature that builds and drops indexes based on observed query patterns. DBMS_AUTO_INDEX.

Disclosure: I work part time in the DB group.

lmm19d ago

If you're going to use serialisable isolation level, why bother using a traditional RDBMS at all? At that point you're better off using a simpler datastore.

vlovich12319d ago

Do you mean to say “if you’re NOT going to use serialisable”? I think you missed the NOT and every reply seemed to think you were arguing a different point but your description about not using foreign keys and using Redis instead only makes sense if there’s a NOT there.

1 more reply

cwillu19d ago

Huh?

1 more reply

stubish19d ago· 5 in thread

The article doesn't mention the biggest problem with serializable isolation. At every commit, you need handle the possibility of a serialization exception and retry the transaction. Traditionally devs and frameworks don't, so your application works fine during development and staging but starts failing under load. It makes commit failures normal, rather than an 'oh shit' problem because your disk has filled or someone has tripped over a network cable.

And how do you retry transactions? Then you hit another issue when using multiple datastores, where you need to learn about two-phase commit and the joys of manually keeping datastores in sync that don't support it (eg. filesystems).

And the locks, if you dare run batch updates along with web requests. The long running transactions lock everything they read, blocking short transactions. Because that is exactly what you asked for. Again, you will miss this during development and only notice under load.

So sure, you might avoid some data consistency issues if your data model and update patterns hit the edge cases. In practice, the reason details about serializable are not well known is the cases are rare. Using it gives you safety (maybe that rare case is your case!), but everything needs to be carefully designed around it.

mike_hearn18d ago

A bigger issue is scalability. Note that this blog post claims YugabyteDB uses serializable as the default isolation level, but if we look at the latest version of their docs it seems they backed off that and at some point the default became snapshot isolation, and then they backed off again and for new databases the default is read committed i.e. the same as Postgres and Oracle:

https://docs.yugabyte.com/stable/explore/transactions/isolat...

So they do allow write skew anomalies.

This happens because serializability requires the database to track every read and intersect them with every write, meaning every node in a distributed cluster must be aware of what every other node is doing. That hurts scalability and performance badly; it's one reason Oracle doesn't support true serializability (the other being that almost no existing DB backed apps can tolerate transaction aborts, as you observe).

There are other issues too. With pessimistic locking you can get deadlocks, but if you deadlock your app the database can detect that and get it moving again by rolling one of the transactions back. It surfaces to the user as a bug that they can easily monitor for and diagnose. With optimistic locking+retry, which serializability basically requires, you can get livelock. Livelock is terrible, the cluster appears to be busy doing useful work but just gets slower and slower. There's not good ways to automatically detect it, at least not in any app framework I've ever encountered.

Disclosure: in the Oracle DB group, opinions are my own and not that of the company. I actually argue with my manager about this topic sometimes, lol.

kccqzy19d ago

My experience just differs from you so much. Two of my employers extensively use serializable isolation and the code to handle retries is automatic. It helps that there are other reasons for retrying the transaction, like load shedding. The only bug I’ve observed is when a colleague forgot that the transaction could retry and the code wrote some logs to disk multiple times; it wasn’t even a correctness bug just a performance bug.

stubish19d ago

I'd say I had worse problems than many due to mixing long and short transactions on the same db. Which is something you need to avoid anyway if possible. The main webapp I dealt with also handled retries automatically, so we could do schema migrations live (with carefully crafted DML and other backend systems shutdown). I think we kept the webapp read committed (PostgreSQL backend), because that system didn't need the consistency guarantees.

Personally, I don't think databases or database libraries should specify a default isolation level at all, and that it needs to be explicitly specified. But legacy code and backwards compatibility and new user experience and all that. I think most of the issues I pointed out come down to needing to be aware of the consequences of your choices, or not even being aware that a choice has been made that can be very hard to change retroactively.

cousin_it18d ago

> At every commit, you need handle the possibility of a serialization exception and retry the transaction.

Yeah, but it seems so strange to me. Imagine if a database simply executed all transactions serially. Then there would be no serialization anomalies (though it would be slow, yada yada). So it seems serializable isolation presents a facade of serial execution, but only like, half a facade. You have to deal with the leak in the abstraction yourself and it's surprising to everyone who hears about it for the first time. I wonder why this choice was made.

andersmurphy18d ago

In SQLite all writes are serialisable by default and it scales really well. I think having a single writer is what makes the big difference here.

SoftTalker19d ago· 5 in thread

You may not need serializable isolation level, but you must understand the concurrency model of your database and the implications of it, and realize that they are not all the same. Oracle, Postgres, MySQL, SQL Server are all different.

ameliaquining19d ago

I think the argument is that thinking through the exact implications of your particular database's concurrency model for each query you write is too much to ask of generalist software engineers; if it's not kept down to a small auditable surface, mistakes will occur pervasively. Therefore, serializable (i.e., do it the obvious way without allowing tricky edge-case states) should be the default, and should be departed from only when performance demands it in a specific case, with careful analysis in those few cases to ensure correctness. (This is pretty closely analogous to the argument for memory-safe programming languages.)

The one complication is that this does mean applications need to be prepared for queries to need to be retried, and while this is reasonably straightforward in most cases (and can in principle be enforced with static analysis), it's not always done today, which makes it hard to change the defaults.

stubish19d ago

And this is why most of the popular database libraries default to 'read committed'. Any higher and your library and scripts became DB specific rather than working with most of the engines supporting ODBC (although even that was tricky, given the different SQL dialects out there).

zadikian19d ago

Tbh I always forget the specifics soon after reading them. Basically you can do an atomic UPDATE WHERE if there are no subqueries involved. 90% of the time that's good enough, and for anything else I end up refreshing on features like SELECT FOR UPDATE.

Well also I know Postgres UNIQUE indexes provide additional locking. Like you can do an INSERT... WHERE NOT EXISTS or INSERT... ON CONFLICT that is guaranteed to succeed.

anarazel18d ago

> Well also I know Postgres UNIQUE indexes provide additional locking. Like you can do an INSERT... WHERE NOT EXISTS or INSERT... ON CONFLICT that is guaranteed to succeed.

That's true only for the latter (and even then only at a isolation level that's not too strict).

1 more reply

hun319d ago

Right.

As an example: Oracle and PostgreSQL don't have dirty reads: READ UNCOMMITTED does nothing. MySQL's concurrency model depends on the engine.

hyperpape19d ago· 4 in thread

> According to the paper, “Of the 22 vulnerabilities, five were level-based, meaning that the default weak isolation level led to the anomalies behind the vulnerabilities. The remaining 17 were scope-based, meaning that the database accesses were not properly encapsulated in transactions and concurrent API requests could trigger the vulnerability independent of the level of isolation provided by the database backend.”

I don't want to commit to a real opinion, but the cynic in me sees a bitter lesson you could take from this is that the database should default to a low isolation level--the damn developers aren't even using transactions right, so why waste performance handling transactions in the strictest possible way?

klodolph19d ago

My experience is that there’s a percentage of developers who get it right and intuitively understand the serializable isolation level.

And database performance is a crapshoot. By “crapshoot”, I mean you mix some developers, data, and database technologies together and you get all sorts of weird performance profiles. Why sacrifice safety for performance, when you’re going to get performance problems anyway? It’s a lot easier to turn down the isolation level for a transaction, and it’s a lot harder to fix an unknown amount of inconsistent data in your database.

(The problem is that there are also a big cohort of developers who aren’t prepared to make their transactions retryable, and they’re writing their code in languages which don’t enforce transaction safety. By “languages which don’t enforce transaction safety”, I mean “languages other than Haskell”.)

onlyrealcuzzo19d ago

So every database in the world not designed in Haskell can't do a proper transaction?

1 more reply

stubish19d ago

Too many people discovered databases with MySQL back in the day and didn't even know transactions were possible. And now 25 years later one of those people is likely a senior dev and 'the db guy' on a team of people who learn SQL by example. Or now never learnt it, relying on an LLM trained on decades of dodgy PHP code.

1 more reply

vlovich12319d ago

That’s the same mentality C and C++ take. Even if it was only 20%, that’s a non trivial amount of vulnerabilities that have nothing to do with the code.

mastermedo19d ago· 3 in thread

> Surprisingly, there are many more stories and publications about bugs caused by weak isolation levels than cases where stronger isolation levels caused impractically low performance.

I expected the article to substantiate the claim that serializable brings a large performance hit as in my experience it isn't so. The article basically makes the same point.

With serializable, you need to be a little careful not to have hot rows. Avoid them by sharding commonly written values. Another way to improve performance is to use true time for ordering non read-then-write transactions. It's a little finicky if the database doesn't provide such guarantees out of the box. Take Google's Spanner as an example. It offers the serializable isolation level and it's pretty performant (as long as you account for hot spots).

mjb19d ago

> With serializable, you need to be a little careful not to have hot rows. Avoid them by sharding commonly written values

Unfortunately, serializable isolation requires detecting or preventing read-write conflicts (i.e. one transaction writing a row that a concurrent transaction has read). This is the performance impact of serializability: you need to be very careful what you read, because if you read too many rows you prevent any concurrent transactions from updating those same rows. Read-only transactions are OK (because MVCC), and read-only tables are OK (because there's no read-write conflict if a table is mostly read only), but tables that are both written and heavily read are where you get performance problems.

With snapshot isolation (e.g. Oracle's serializable, Postgres repeatable read), only write-write conflicts matter. There it doesn't matter what a transaction reads, and reads never need to block (or abort) writers. So what you say is true for snapshot, but not for serializable.

Interestingly, serializable's lack of need to detect write-write conflicts means that (in some implementations) it can be faster than snapshot for blind writes (i.e. anything that's not a read-modify-write under the covers).

mastermedo19d ago

I agree with what you're saying. I think we're saying the same thing. I see that in the text you're quoting I only proposed a way to handle hotly written rows, and didn't address read rows being changed. This is also a problem.

My broader point is that with serializable you need to be aware of these bottlenecks in the database and you need to create a data model and access patterns such that permance hits are avoided as much as possible.

kccqzy18d ago

It’s somewhat rare for a transaction to read many rows and then write. Typically a transaction reads a single row, applies business logic, writes a single row. Then you have analytic workloads that do no writes and read entire tables.

sanqui19d ago· 1 in thread

I only recently learned about serializable transactions and it seems bonkers that this is not the default. It makes a lot of sense combined with the event sourcing pattern. I believe it allows you to query for state in the decide function and then emit events safely without having to implement aggregates or versioning (aka you have "dynamic consistency boundaries"). The crucial part is that if any of the queried information changes before the event is emitted the transaction fails and business logic has to be retried until you get a conclusive answer.

sublinear19d ago

The need for retries is a not a small performance hit.

This is a rabbit hole worth going down, but it shouldn't be the default. This is a classic case of Chesterton's Fence.

MaulingMonkey19d ago· 1 in thread

TIL the horror that is the existence of non-serializable isolation.

I'm a gamedev. I've worked alongside webdevs (frontend and backend) that build our websites and forums. Alongside coworkers who handle networking stuff while I port things on the same project. Spotted SQLIs for people and pointed them on a better path [1]. I've dabbled in my own share of SQL-adjacent queries... which is to say databases have always been on my list of things I should probably take the time to put properly into my toolkit, for increased reliability and data durability. After all, rotating file snapshots by hand, and fuzzing formats to create recoverability from corruption (if only by detecting it and reverting to previous snapshots instead of crashing or corrupting further) is clearly the work of uncultured barbarians, bereft the wonders of proper fsync-aware ACID storage technologies.

And then I read this:

---------------------------------------------

However, many database vendors use weaker isolation levels by default, in particular:

• “Read committed” in PostgreSQL and Oracle.

• “Repeatable read” in MySQL/InnoDB (there is a subtlety, see below) in YugabyteDB.

---------------------------------------------

I regret the clearly undue respect and regard I've given to database technology. I knew some of this kind of nonsense had intruded with the NoSQL and sharding crowd, but I thought you at least had to ask for such ruination for most of the SQLs used in production, at least in the context of a singular database. Euhg.

1. https://blog.codinghorror.com/give-me-parameterized-sql-or-g...

stubish19d ago

The isolation level that the DB engine defaults to doesn't matter. That is for backwards compatibility with the 90s. What matters is what isolation level the connection library you are using defaults to. Unfortunately, that too is often backwards compatible with the 90s (aka ODBC).

zadikian19d ago

Was curious about the Flexcoin hack, but the article wasn't loading, so here's an archive: https://web.archive.org/web/20240423000007/https://hackingdi... Supposedly it was this simple:

  mybalance = database.read("account-number")
  newbalance = mybalance - amount
  database.write("account-number", newbalance)
  dispense_cash(amount)   // or send bitcoins to customer

and MongoDB didn't even have a way to do this atomically? An RDBMS with read-committed would handle this fine if you did "read for update" on that row.

j / k navigate · click thread line to collapse

61 comments

36 comments · 8 top-level

lukas22119d ago· 9 in thread

I would argue that not using serialization isolation level by default is like not using a memory safe programming language by default.

Sure, sometimes it's too slow, but it should be the default.

zadikian19d ago

In my experience, the performance hit is so bad that it's not feasible to use that way. It's also not strictly safer behavior-wise because retries can trip people up.

jiggawatts19d ago

I recently encountered a query that deadlocked on itself because it used a parallel execution plan and updated multiple indexes in a manner that the different threads could conflict with each other.

Naively one would expect that no individual UPDATE or INSERT statement could deadlock in isolation… but there you go.

If that is possible, the possibilities across multiple concurrent data mutating queries are beyond human comprehension!

Serializable should absolutely be the default!

Similarly, all columns should be automatically indexed to at least some degree, like Postgres BRIN indexes at a minimum.

Time and experience have shown that the vast majority of developers are pathologically unable to properly define all required indexes ahead of time.

hilariously19d ago

I've definitely experienced the opposite, where indexes run rampant and are NN times the size of data (for the entire database!)

I dont think most columns need to be indexed because that is mostly nonsense, most columns never have a where clause applied to them or are sorted on themselves, so an index provides ??? value.

63stack19d ago

You are supposed to define indexes based on how you query the data, not ahead of time.

2 more replies

RaftPeople18d ago

MS SQL?

We've had to set MAXDOP 1 on some specific queries for a long time.

mike_hearn18d ago

Worth noting that Oracle has an auto-indexing feature that builds and drops indexes based on observed query patterns. DBMS_AUTO_INDEX.

Disclosure: I work part time in the DB group.

lmm19d ago

If you're going to use serialisable isolation level, why bother using a traditional RDBMS at all? At that point you're better off using a simpler datastore.

vlovich12319d ago

1 more reply

cwillu19d ago

Huh?

1 more reply

stubish19d ago· 5 in thread

mike_hearn18d ago

https://docs.yugabyte.com/stable/explore/transactions/isolat...

So they do allow write skew anomalies.

Disclosure: in the Oracle DB group, opinions are my own and not that of the company. I actually argue with my manager about this topic sometimes, lol.

kccqzy19d ago

stubish19d ago

cousin_it18d ago

> At every commit, you need handle the possibility of a serialization exception and retry the transaction.

andersmurphy18d ago

In SQLite all writes are serialisable by default and it scales really well. I think having a single writer is what makes the big difference here.

SoftTalker19d ago· 5 in thread

ameliaquining19d ago

stubish19d ago

zadikian19d ago

Well also I know Postgres UNIQUE indexes provide additional locking. Like you can do an INSERT... WHERE NOT EXISTS or INSERT... ON CONFLICT that is guaranteed to succeed.

anarazel18d ago

> Well also I know Postgres UNIQUE indexes provide additional locking. Like you can do an INSERT... WHERE NOT EXISTS or INSERT... ON CONFLICT that is guaranteed to succeed.

That's true only for the latter (and even then only at a isolation level that's not too strict).

1 more reply

hun319d ago

Right.

As an example: Oracle and PostgreSQL don't have dirty reads: READ UNCOMMITTED does nothing. MySQL's concurrency model depends on the engine.

hyperpape19d ago· 4 in thread

klodolph19d ago

My experience is that there’s a percentage of developers who get it right and intuitively understand the serializable isolation level.

onlyrealcuzzo19d ago

So every database in the world not designed in Haskell can't do a proper transaction?

1 more reply

stubish19d ago

1 more reply

vlovich12319d ago

That’s the same mentality C and C++ take. Even if it was only 20%, that’s a non trivial amount of vulnerabilities that have nothing to do with the code.

mastermedo19d ago· 3 in thread

> Surprisingly, there are many more stories and publications about bugs caused by weak isolation levels than cases where stronger isolation levels caused impractically low performance.

I expected the article to substantiate the claim that serializable brings a large performance hit as in my experience it isn't so. The article basically makes the same point.

mjb19d ago

> With serializable, you need to be a little careful not to have hot rows. Avoid them by sharding commonly written values

mastermedo19d ago

kccqzy18d ago

sanqui19d ago· 1 in thread

sublinear19d ago

The need for retries is a not a small performance hit.

This is a rabbit hole worth going down, but it shouldn't be the default. This is a classic case of Chesterton's Fence.

MaulingMonkey19d ago· 1 in thread

TIL the horror that is the existence of non-serializable isolation.

And then I read this:

---------------------------------------------

However, many database vendors use weaker isolation levels by default, in particular:

• “Read committed” in PostgreSQL and Oracle.

• “Repeatable read” in MySQL/InnoDB (there is a subtlety, see below) in YugabyteDB.

---------------------------------------------

1. https://blog.codinghorror.com/give-me-parameterized-sql-or-g...

stubish19d ago

zadikian19d ago

Was curious about the Flexcoin hack, but the article wasn't loading, so here's an archive: https://web.archive.org/web/20240423000007/https://hackingdi... Supposedly it was this simple:

  mybalance = database.read("account-number")
  newbalance = mybalance - amount
  database.write("account-number", newbalance)
  dispense_cash(amount)   // or send bitcoins to customer

and MongoDB didn't even have a way to do this atomically? An RDBMS with read-committed would handle this fine if you did "read for update" on that row.

j / k navigate · click thread line to collapse