Sure, sometimes it's too slow, but it should be the default.
Very few people can write correct database code at the other serialization levels. Most think they can, but it's harder than correct multi-threading, because databases do weird unintuitive things for performance.
Naively one would expect that no individual UPDATE or INSERT statement could deadlock in isolation… but there you go.
If that is possible, the possibilities across multiple concurrent data mutating queries are beyond human comprehension!
Serializable should absolutely be the default!
Similarly, all columns should be automatically indexed to at least some degree, like Postgres BRIN indexes at a minimum.
Time and experience have shown that the vast majority of developers are pathologically unable to properly define all required indexes ahead of time.
That being said also "required indexes ahead of time" is impossible - because query patterns change and things get released unless you mean something like "the required indexes for our obvious query patterns we just freaking released.
I dont think most columns need to be indexed because that is mostly nonsense, most columns never have a where clause applied to them or are sorted on themselves, so an index provides ??? value.
MS SQL?
We've had to set MAXDOP 1 on some specific queries for a long time.
Disclosure: I work part time in the DB group.
And how do you retry transactions? Then you hit another issue when using multiple datastores, where you need to learn about two-phase commit and the joys of manually keeping datastores in sync that don't support it (eg. filesystems).
And the locks, if you dare run batch updates along with web requests. The long running transactions lock everything they read, blocking short transactions. Because that is exactly what you asked for. Again, you will miss this during development and only notice under load.
So sure, you might avoid some data consistency issues if your data model and update patterns hit the edge cases. In practice, the reason details about serializable are not well known is the cases are rare. Using it gives you safety (maybe that rare case is your case!), but everything needs to be carefully designed around it.
https://docs.yugabyte.com/stable/explore/transactions/isolat...
So they do allow write skew anomalies.
This happens because serializability requires the database to track every read and intersect them with every write, meaning every node in a distributed cluster must be aware of what every other node is doing. That hurts scalability and performance badly; it's one reason Oracle doesn't support true serializability (the other being that almost no existing DB backed apps can tolerate transaction aborts, as you observe).
There are other issues too. With pessimistic locking you can get deadlocks, but if you deadlock your app the database can detect that and get it moving again by rolling one of the transactions back. It surfaces to the user as a bug that they can easily monitor for and diagnose. With optimistic locking+retry, which serializability basically requires, you can get livelock. Livelock is terrible, the cluster appears to be busy doing useful work but just gets slower and slower. There's not good ways to automatically detect it, at least not in any app framework I've ever encountered.
Disclosure: in the Oracle DB group, opinions are my own and not that of the company. I actually argue with my manager about this topic sometimes, lol.
Personally, I don't think databases or database libraries should specify a default isolation level at all, and that it needs to be explicitly specified. But legacy code and backwards compatibility and new user experience and all that. I think most of the issues I pointed out come down to needing to be aware of the consequences of your choices, or not even being aware that a choice has been made that can be very hard to change retroactively.
Yeah, but it seems so strange to me. Imagine if a database simply executed all transactions serially. Then there would be no serialization anomalies (though it would be slow, yada yada). So it seems serializable isolation presents a facade of serial execution, but only like, half a facade. You have to deal with the leak in the abstraction yourself and it's surprising to everyone who hears about it for the first time. I wonder why this choice was made.
The one complication is that this does mean applications need to be prepared for queries to need to be retried, and while this is reasonably straightforward in most cases (and can in principle be enforced with static analysis), it's not always done today, which makes it hard to change the defaults.
Well also I know Postgres UNIQUE indexes provide additional locking. Like you can do an INSERT... WHERE NOT EXISTS or INSERT... ON CONFLICT that is guaranteed to succeed.
That's true only for the latter (and even then only at a isolation level that's not too strict).
As an example: Oracle and PostgreSQL don't have dirty reads: READ UNCOMMITTED does nothing. MySQL's concurrency model depends on the engine.
I don't want to commit to a real opinion, but the cynic in me sees a bitter lesson you could take from this is that the database should default to a low isolation level--the damn developers aren't even using transactions right, so why waste performance handling transactions in the strictest possible way?
And database performance is a crapshoot. By “crapshoot”, I mean you mix some developers, data, and database technologies together and you get all sorts of weird performance profiles. Why sacrifice safety for performance, when you’re going to get performance problems anyway? It’s a lot easier to turn down the isolation level for a transaction, and it’s a lot harder to fix an unknown amount of inconsistent data in your database.
(The problem is that there are also a big cohort of developers who aren’t prepared to make their transactions retryable, and they’re writing their code in languages which don’t enforce transaction safety. By “languages which don’t enforce transaction safety”, I mean “languages other than Haskell”.)
I expected the article to substantiate the claim that serializable brings a large performance hit as in my experience it isn't so. The article basically makes the same point.
With serializable, you need to be a little careful not to have hot rows. Avoid them by sharding commonly written values. Another way to improve performance is to use true time for ordering non read-then-write transactions. It's a little finicky if the database doesn't provide such guarantees out of the box. Take Google's Spanner as an example. It offers the serializable isolation level and it's pretty performant (as long as you account for hot spots).
Unfortunately, serializable isolation requires detecting or preventing read-write conflicts (i.e. one transaction writing a row that a concurrent transaction has read). This is the performance impact of serializability: you need to be very careful what you read, because if you read too many rows you prevent any concurrent transactions from updating those same rows. Read-only transactions are OK (because MVCC), and read-only tables are OK (because there's no read-write conflict if a table is mostly read only), but tables that are both written and heavily read are where you get performance problems.
With snapshot isolation (e.g. Oracle's serializable, Postgres repeatable read), only write-write conflicts matter. There it doesn't matter what a transaction reads, and reads never need to block (or abort) writers. So what you say is true for snapshot, but not for serializable.
Interestingly, serializable's lack of need to detect write-write conflicts means that (in some implementations) it can be faster than snapshot for blind writes (i.e. anything that's not a read-modify-write under the covers).
My broader point is that with serializable you need to be aware of these bottlenecks in the database and you need to create a data model and access patterns such that permance hits are avoided as much as possible.
This is a rabbit hole worth going down, but it shouldn't be the default. This is a classic case of Chesterton's Fence.
I'm a gamedev. I've worked alongside webdevs (frontend and backend) that build our websites and forums. Alongside coworkers who handle networking stuff while I port things on the same project. Spotted SQLIs for people and pointed them on a better path [1]. I've dabbled in my own share of SQL-adjacent queries... which is to say databases have always been on my list of things I should probably take the time to put properly into my toolkit, for increased reliability and data durability. After all, rotating file snapshots by hand, and fuzzing formats to create recoverability from corruption (if only by detecting it and reverting to previous snapshots instead of crashing or corrupting further) is clearly the work of uncultured barbarians, bereft the wonders of proper fsync-aware ACID storage technologies.
And then I read this:
---------------------------------------------
However, many database vendors use weaker isolation levels by default, in particular:
• “Read committed” in PostgreSQL and Oracle.
• “Repeatable read” in MySQL/InnoDB (there is a subtlety, see below) in YugabyteDB.
---------------------------------------------
I regret the clearly undue respect and regard I've given to database technology. I knew some of this kind of nonsense had intruded with the NoSQL and sharding crowd, but I thought you at least had to ask for such ruination for most of the SQLs used in production, at least in the context of a singular database. Euhg.
1. https://blog.codinghorror.com/give-me-parameterized-sql-or-g...
mybalance = database.read("account-number")
newbalance = mybalance - amount
database.write("account-number", newbalance)
dispense_cash(amount) // or send bitcoins to customer
and MongoDB didn't even have a way to do this atomically? An RDBMS with read-committed would handle this fine if you did "read for update" on that row.