undefined | Better HN

0 pointssaurik13y ago0 comments

First, there is a major difference between MVCC and update-in-place that you can detect as a client, and that difference is that the problems that Rich outlines at the beginning of his talk do not happen: if one client edits something in the database, other transactions do not get an inconsistent view because the data on disk has already been permanently and irrevocably "updated in place". (Which, to be clear, means that modern SQL databases do not "expose a view to the world of an update in place database".)

Second, if all that is required to get his model is to add a command to an existing database (such as PostgreSQL, as I feel I know enough about how it works to be confident that this would be a reasonably simple task) "mark the current transaction read-only and pretend that it is as old as transaction X" (something that can be implemented quite rapidly in an existing system like PostgreSQL) we really aren't talking about something that is either very new, or that totally reinvents the "traditional database".

0 comments

6 comments · 2 top-level

DanWaterworth13y ago· 4 in thread

> Which, to be clear, means that modern SQL databases do not "expose a view to the world of an update in place database".

I disagree with you on this point. MVCC may prevent you from suffering from locking problems, but you can still, from a user perspective, modify rows. A row is a place and you are updating it. It's definitional.

On your second point. The data model that datomic databases expose is very different from SQL databases. That's enough of a difference to say that it's fundamentally new. Furthermore, I don't think anyone would disagree that the architecture of datomic is very different from that of PostgreSQL.

When you compare a distributed database that uses immutable data against PostgreSQL, the thing that is immediately apparent to me is that garbage collection is much more difficult in the distributed setting. You can't just rewrite the network interface for PostgreSQL and get datomic, but you might be able to get single-server datomic.

I think you are seeing a simple design and thinking, "anyone could have thought that up", when actually it's not that easy. The design that you are looking at has obviously gone through extensive refinement.

jeltz13y ago

> When you compare a distributed database that uses immutable data against PostgreSQL, the thing that is immediately apparent to me is that garbage collection is much more difficult in the distributed setting. You can't just rewrite the network interface for PostgreSQL and get datomic, but you might be able to get single-server datomic.

PostgeSQL-XC solves this problem by adding a global transaction management server. I guess it does the same thing as the transactor for Datomic so there really is no major difference here.

saurikOP13y ago

In your last paragraph, I feel like you are mischaracterizing my overall thesis. I am not claiming the design is simple: MVCC took many lives in sacrifice to its specification and discovery, and I certainly am not claiming "anyone could have thought that up". Instead, my primary issue is that this is a talk about databases and database design that is providing motivation vs a strawman: specifically, the way Rich seems to believe "traditional databases" work, and for which we spend the first almost 20 minutes learning the negatives, roadblocks, and general downsides.

However, almost none of the things that he indicates actually are downsides of most modern database systems, and certainly not of PostgreSQL. His downsides include that the data structuring is simplistic, that you can't have efficient and atomic replication of it (not multi-master mind you, but seemingly even doing real-time replication of a master to a read-only slave while maintaining serialization semantics seems to be dismissed), and that if you attempt to make multiple queries you will get inconsistent data due to update-in-place storage.

Yes: update-in-place "storage", not "update-in-place semantics within the scope of an individual transaction". Even if he was very clear about the latter (which is again quite different from "update-in-place semantics", which MVCC definitely does not have), that would still undermine his points, as the problem of inconsistent data from multiple reads, a problem he goes into great detail about with an example involving a request for a webpage that needs to make a query first for its backend data and then for its display information, does not exist with MVCC.

During this discussion of storage, he specifically talks about how existing database storage systems work, not at the model level, but at the disk level, discussing how b-trees and indexes are implemented with their destructive semantics... and all of these details are wrong, at least for PostgreSQL and Oracle, and I believe even for MySQL InnoDB (although a lot of its MVCC semantics are in-memory-only AFAIK, so I'm happily willing to believe that it actually destroys b-tree nodes on disk).

The talk then discusses a new way of storing data, and that new way of storing data happens to share the key property he calls new with the old way of storing data. The result is that it is very difficult to see why I should be listening to this talk, as the speaker either doesn't know much about existing database design or is purposely lying to me to make this new technology sound more interesting :(. Your response that in a different talk he attempted to backpatch his argument with something that still doesn't seem to address MVCC's detectably-not-the-same-as-update-in-place-semantics doesn't help this.

Now, as I stated up front, after listening to half of this talk, I couldn't take it anymore, and I gave up: I thereby didn't hear an entire half hour of him speaking. Maybe somewhere in that second half there is something new about how some particular quirk of his model allows you to get a distributed system, but that seemed sufficiently unlikely after the first half that it really doesn't seem worth it, and based on the comments from discussion (such as in the threads started by bsaul and sriram_malhar, which seems to indicate that writes are centralized and reads are distributed, something you can do with any off-the-shelf SQL solution these days) that seems to hold up.

richhickey13y ago

The model of consistency envisioned by Datomic is one in which consistency normally available only within a transaction is available outside of any transactions, and without any central authority. Consistent views can be reconstituted the next hour, day or week. Consistent points in time can be efficiently communicated to other processes. Nothing about MVCC gives you any of that. MVCC is an implementation detail that reduces coordination overhead in transactional systems. I used MVCC in the implementation of Clojure's STM. While you might imagine it being simple to flip a bit on an MVCC system and get point-in-time support, it is a) not efficient to do so, and b) still a coordinated transactional system.

The differences I am pointing out, and the notion of place I discuss, are not about the implementation details in the small (e.g. whether or not a db is MVCC or updates its btree nodes in place) but the model in the large. If you 'update' someone's email is the old email gone? Must you be inside a transaction to see something consistent? Is the system oriented around preserving information (the facts of events that have happened), or is the system oriented around maintaining a single logical value of a model?

The fact is with PostgreSQL et al, if you 'update' someone's email the old one is gone, and you can only get consistency within a transaction. It is a system oriented around maintaining a single logical value of a model. And there's nothing wrong with that - it's a great system with a lot of utility. But it isn't otherwise just because you say it could be.

Also, you seem to be reacting as if I (or someone) has claimed that Datomic is revolutionary. I have never made such claims. Nothing is truly novel, everything has been tried before, and we all stand on the shoulders of giants.

I'm sorry my talk didn't convey to you my principal points, and am happy to clarify.

1 more reply

fauigerzigerk13y ago

I agree with most of what you say, but I don't think that MVCC is really what this is about. The qualities you describe are a feature of ACID. MVCC is just a way of implementating ACID so that it requires less locking.

More importantly, I think, there are issues with some data structures that are not well supported by postgres or any other DBMS (relational or otherwise). I do a lot of text analytics work and there are things I need to store about spans of text that I could model in a relational fashion but I don't because it would lead to 99% of my data being foreign keys and row metadata.

There will always be domains where you need highly specialized combinations of data structures and algorithms that are not efficient to model relationally and even less in terms of some of the other datamodels that you find in the NoSQL space.

That said, I found that even in natural language processing, RDBMS do a lot of things surprisingly more efficiently than conventional wisdom would have it. Storing lots of small files for instance, something that file systems are suprisingly bad at.

Sometimes I'm surprised how many people like to complain about premature optimization using languages that are hundereds of times slower than others but then go ahead and use horribly inflexible crap like the BigTable data model just in case they need to scale like Google.

Of course that's off topic because it's not remotely what Hickey proposes.

1 more reply

azolotko13y ago

AFAIK there was a Postgres extension that did something like that. But than it became deprecated and was eventualy removed.

j / k navigate · click thread line to collapse

0 comments

6 comments · 2 top-level

DanWaterworth13y ago· 4 in thread

> Which, to be clear, means that modern SQL databases do not "expose a view to the world of an update in place database".

jeltz13y ago

PostgeSQL-XC solves this problem by adding a global transaction management server. I guess it does the same thing as the transactor for Datomic so there really is no major difference here.

saurikOP13y ago

richhickey13y ago

I'm sorry my talk didn't convey to you my principal points, and am happy to clarify.

1 more reply

fauigerzigerk13y ago

Of course that's off topic because it's not remotely what Hickey proposes.

1 more reply

azolotko13y ago

AFAIK there was a Postgres extension that did something like that. But than it became deprecated and was eventualy removed.

j / k navigate · click thread line to collapse