I think it’s more tractable to define this problem space starting from the concept of (strict) serializability, which is really a generalization of the concept of thread safety. Every software engineer has an intuitive understanding of it. Lack of serializability can lead to execution-dependent behavior, which usually results in hard-to-diagnose bugs. Thus, all systems should strive towards serializability, and the database can be a tool in achieving it.
Various non-serializable levels of database transaction isolation are relaxations of the serializability guarantee, where the database no longer enforces the guarantee and it’s up to the database user to ensure it through other means.
The isolation phenomena are a useful tool for visualizing various corner cases of non-serializability, but they are not inherently tied to it. It's possible to achieve serializability while observing all of the SQL phenomena. For example, a Kubernetes cluster with carefully-written controllers can be serializable.
The combination of transactions, isolation levels, and MVCC is such a huge undertaking to cover all at once, specially when comparing how it's done across multiple DBs which I attempted here. Always a balance between technical depth, accessibility to people with less experience, and not letting it turn into an hour-long read.
If anything, I’d say it might be better to start with the lower isolation levels first, highlight the concurrency problems that can arise with them, and gradually introduce higher isolation levels until you get to serializability. That feels a bit more intuitive rather than downward progression from serializability to read uncommitted as presented here.
It also might be nice to see a quick discussion of why people choose particular isolation levels in practice, e.g. why you might make a tradeoff under high concurrency and give up serializability to avoid waits and deadlocks.
But excellent article overall, and great visualizations.
More notation, more citations, more better.
Unsure why "strict" (L + S) is in braces: Linearizability ("L") is what resembles safety in SMP systems the most?
* the videos should have "pause" and a "step at a time" control *
Even at the "half speed", without a deep knowledge of the context, the videos move way too fast for me to read the syntax that's invoking and line it up with the data on the left side. I (and im definitely not the only one) need to be able to sit on one step and stare at the whole thing without the latent anxiety of the state changing before I've had a chance to grok the whole thing.
this has nothing to do with familiarity with the concepts (read my profile). I literally need time to read all the words and connect them together mentally (ooh, just noticed this is pseudo-SQL syntax also, e.g. "select id=4", that probably added some load for me) without worrying they're going to change before watching things move.
please add a step-at-a-time button!
When you BEGIN a transaction, you're creating a branch in Git. Everyone else continues to work on the master branch, perhaps making their own branches (transactions) off of it while you're working. Every UPDATE command you run inside the transaction is a commit pushed to your branch. If you do a ROLLBACK, then you're deleting the branch unmerged, and its changes will be discarded without ever ending up in the master branch. But if you instead do a COMMIT, then that's a `git merge` command, and your changes will be merged into the master branch. If they merge cleanly, then all is well. If they do NOT merge cleanly, because someone else merged their own branch (committed their own transaction) that touched the same files that you touched (updated rows in the same table), then the DB will go through the file line by line (go through the table row by row) to try to get a clean merge. If it can successfully merge both changes without conflict, great. If it can't, then what happens depends on the transaction settings you chose. You can, when you start the transaction, tell the DB "If this doesn't merge cleanly, roll it back". Or you can say "If this doesn't merge cleanly, I don't care, just make sure it gets merged and I don't care if the conflict resolution ends up picking the "wrong" value, because for my use case there is no wrong value." This is like using "READ UNCOMMITTED" vs "SERIALIZABLE" transaction settings (isolation levels): you would use "READ UNCOMMITTED" if you don't care about merge conflicts in this particular table, and just want a quick merge. You would use "SERIALIZABLE" for tables with data that must, MUST, be correct, e.g. account balances. And there are two more levels in between for subtle differences in your use case's requirements.
As with my previous comment, this is probably obvious to 98.5% of people here. But maybe it'll help someone get that "ah-ha!" moment and understand transactions better.
Using read-committed ofc means having to keep locking details in mind. Like, UNIQUE doesn't just guard against bad data entry, it can also be necessary for avoiding race conditions. But now that I know, I'd rather do that than take the serializable performance hit, and also have to retry xacts and deal with the other caveats at the bottom of https://www.postgresql.org/docs/current/transaction-iso.html
https://dev.mysql.com/doc/refman/8.4/en/set-transaction.html...
https://mariadb.com/docs/server/reference/sql-statements/adm...
I don't know about MyISAM though (who uses it anyway ;-) ).
Oracle and SQL Server also default to read committed, not serializable. Serializable looks good in text books but is rarely used in practice.
The best implementation of serializable transactions I've seen is in FoundationDB but it comes with serious costs. Transactions are limited in size and duration to a point where many normal database operations are disallowed by the system and require app-layer workarounds (at which point, of course, you lose serializability). And in many cases you do need cluster locks for other purposes anyway.
It goes into not only different isolation levels, but also some ambiguity in the traditional ACID definition.
I believe a 2nd edition is imminent.
Am I missing something or this statement is incomplete? Also I find the visualization of commit weird, it “points to” the header of the table, but then xmax gets updated “behind the scenes”? Isnt xmax/xmin “the mechanism behind how the database knows what is committed/not committed”? Also, there could be subtransactions, which make this statement even more contradictory?
I enjoyed the visualizations and explanations otherwise, thanks!
If you do a ROLLBACK, then your private copy of the data is discarded, and its changes never make it into the official copy. But if you do a COMMIT, then your private snapshot is made public and is the new, official, copy for everyone else to read from. (Except those who started a transaction before you ran COMMIT: they made their private copies from the older snapshot and don't have a copy of your changes).
This is probably obvious to nearly everyone here, but I figured I'd write it anyway. You never know who might read an analogy like this and have that lightbulb moment where it suddenly makes sense.
P.S. Another analogy would be Git branches, but I'll write that in a different comment.
SELECT followed by an update is the most usual case for a block. (I have to code one today, and I want to see if I can rewrite it as one MySQL statement.)
And no I'd never expect people to know the isolation levels by heart, but if you know there are different ones and they behave differntly that's pretty good and tells me you are curious about how things work under the hood.
> Under the SQL standard, the repeatable read level allows phantom reads, though in Postgres they still aren't possible.
This is bad wording which could lead to an impression that a repeatable read may show different values. Values in rows will be the same but new rows may be added to the second result set. New rows is important as no previously read rows can be either changed or deleted — otherwise there will be no repetition for those rows second time around.
If the data is fairly straightforward like just one-to-many CRUD with no circular references, you would be able to do it without transactions, just table relationships would be enough to ensure consistency.