Immutability Changes Everything (2016) [pdf] (opens in new tab)

(cidrdb.org)

118 pointsfire_lake1y ago42 comments

42 comments

30 comments · 10 top-level

gatane1y ago· 7 in thread

My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes. Sure, you could have zippers to aid in the updating process by acting as a kind of cursor/pointer, but raw access to data beats them anytime (even if you optimize for cache).

So if you had to optimize for raw speed, why not choose mutable data?

https://ksvi.mff.cuni.cz/~sefl/papers/zippers.pdf

dsQTbR7Y5mRHnZv1y ago

> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.

Conceptually yes, but the implementation doesn't always necessarily need to work that way under the hood: https://www.roc-lang.org/functional#opportunistic-mutation

1 more reply

KingMob1y ago

> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.

That's not generally true. Many immutable languages are using "persistent" data structures, where "persist" here means that much of the original structure persists in the new one.

For more, see:

- Purely Functional Data Structures by Okasaki: https://www.cs.cmu.edu/~rwh/students/okasaki.pdf - Phil Bagwell's research - e.g., https://infoscience.epfl.ch/record/64398/files/idealhashtree...

munchler1y ago

> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.

That is not true in general. There are plenty of data structures that can be updated without forcing a full copy. Lists, trees, sets, maps, etc. All of these are common in functional programming. This is discussed in the article (e.g. "Append-Only Computing").

sarchertech1y ago

If you really care about performance, iterating over all of those is going to much much slower than iterating over an array.

1 more reply

mrkeen1y ago

Someone should try it with postgres. Make a raw speed branch that gets rid of the overhead of mvcc:

  while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data

  https://www.postgresql.org/docs/7.1/mvcc.html

ahoka1y ago

That’s not exactly how PostgreSQL works. This is true only at certain isolation levels.

cratermoon1y ago

https://dl.acm.org/doi/10.1145/356635.356640

LeftHandPath1y ago· 6 in thread

Immutability is a fantastic tool, especially when working with enterprise data. It's relatively easy to implement your own temporal tables on most existing databases, no special libraries or tools required. It seems really trivial/obvious, but I'll admit I first stumbled into the concept using the AS400 at work. If you make a mistake on payroll in IBM's old MAPICS program, you don't overwrite or delete it. You introduce a new "backout record" to nullify it, then (maybe) insert another record with the correct data. It seems obvious once you've seen the pattern.

I've made a few non-technical eyes go wide by explaining A) that this is done and B) how it is done. The non-tech crypto/blockchain enthusiasts I've met get really excited when they learn you can make a set of data immutable without blockchain / merkle trees. Actually, explaining that is a good way to introduce the concept of a merkle tree / distributed ledger, and why "blockchain" is specifically for systems without a central authority.

(Bi)Temporal and immutable tables are especially useful for things like HR, PTO, employee clock activity, etc. Helps keep things auditable and correct.

layer81y ago

Without specific support from the RDBMS, bitemporal schemas are difficult with regard to cross-table references, such as foreign keys. Rows that need to be consistent between tables aren’t necessarily 1:1 anymore, but instead each row in one table needs to be consistent with all corresponding rows in the other table having an intersecting time interval. You then run into problems with transaction isolation and visibility.

pyrale1y ago

> bitemporal schemas are difficult with regard to cross-table references

Who needs more than one table ? >:)

More complex models can be built and stored separately. The great benefit of this method being that, once you're unhappy with your table model, you can trash it and rebuild it from scratch without regard for data migration.

1 more reply

hobs1y ago

Pretty much, you want triggers to store things in a schemaless fashion in an audit format so that you are free to migrate tables.

This does require either knowing the schema at the point in time or recording enough information to do a schema on read.

The other options are of course you basically run a table like an API, always adding, never removing.

refset1y ago

> It's relatively easy to implement your own temporal tables on most existing databases

It gets tricky when you need to change the schema without breaking historical data or queries. SQL databases could do a lot more to make immutability easier and widespread.

jiggawatts1y ago

One fundamental issue I’ve noticed is that typical SQL databases have a single schema per table defining both the logical and physical aspects, typically with a strong correlation between the two.

Databases could treat the columns as the fundamental unit with tables being not much more than a view of a bunch of columns that can change over both space (partitioning) and time (history).

1 more reply

teleforce1y ago

>Actually, explaining that is a good way to introduce the concept of a merkle tree / distributed ledger, and why "blockchain" is specifically for systems without a central authority

This is a very important points, for whatever systems or solutions that you do, do not overengineer and always remember premature optimization is the root of all evil.

It used to be blockchain and it seems apparently ML/AI is the new fad. Most probably majority of the solutions being design now with ML/AI does not need it and in doing so just make it expensive/slow/complex/non-deterministic/etc.

People need to wake up and smell the coffee, since ultimately ML/AL it just a tool inside the many tools toolbox.

cowsandmilk1y ago· 2 in thread

The “right to be forgotten” has caused a lot of conflicts with certain immutable data stores. If I can reconstruct a snapshot with a user’s data, have I actually “forgotten” them? Having a deadline where the merges fully occur and old data is rendered inaccessible is sometimes necessary legally.

hcarvalhoalves1y ago

You can always "redact" previous data. You can treat the sensible entries themselves as mutable, without it breaking the system design around immutable data.

I have also seen a scheme where you store the hash, and have a separate lookup table for sensible data, that you can redact more easily without messing with the log.

mrkeen1y ago

Likewise with database backups.

prydt1y ago· 2 in thread

One of my favorite papers! This reminds me of Martin Kleppmann's work on Apache Samza and the idea of "turning the database inside out" by hosting the write-ahead log on something like Kafka and then having many different materialized views consume that log.

Seems like a very powerful architecture that is both simple and decouples many concerns.

0823498723498721y ago

In their 1992 Transaction Processing book*, Gray and Reuter extrapolate h/w and s/w trends forward and predict that the DBMS of their far future would look like a tape robot for backing store with materialised views in main memory.

Substitute streams for tape i/o, and this description of Samza sounds like it could be very similar to that vision.

* as far as I know, their exposition of the WAL and tradeoffs in its implementation has aged well. Any counter opinions?

gsf_emergency_21y ago

Thanks!

gleenn1y ago· 1 in thread

I love the quote "accountants don't use erasers". So many things should be modeled over time and keep track of change right out the gate. Little things like Ruby on Rails always adding timestamps to model tables was super helpful but also a little code smell. If this is obvious enough to be useful everywhere, what is the next level? One more reason Datamoic is so cool: nothing is overwritten, it is overlayed with a newer record and you can always look back and you can always also always take a slice of the db at a specific time and have a complete and consistent viewbof the universe at that time. Immutability!

yencabulator1y ago

Accountants also have trivially simple schemas. (Though lots of complexity elsewhere.)

lbj1y ago· 1 in thread

I have to say, I really love the title :)

cacozen1y ago

I guess “Immutability changes nothing” wouldn’t have the same impact

niuzeta1y ago· 1 in thread

Semi-related, but is there any repository(ies?) that comprise of these technical white papers? I'm fascinated by these papers whenever they show up in my feed and I gorge on them, and I'd love more. I can't be the only one thinking this way.

ahoka1y ago

I can recommend Adrian Colyer‘s excellent The Morning Paper blog: https://blog.acolyer.org/

dang1y ago

Immutability Changes Everything (2016) - https://news.ycombinator.com/item?id=27640308 - June 2021 (94 comments)

Immutability Changes Everything - https://news.ycombinator.com/item?id=10953645 - Jan 2016 (4 comments)

Immutability Changes Everything [pdf] - https://news.ycombinator.com/item?id=8955130 - Jan 2015 (25 comments)

(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)

skybrian1y ago

Editors and form validation are where this gets tricky. The user isn't just reporting new, independent observations to append to a log. They're looking at existing state and deciding how to react to it. Sometimes avoiding constraint violations with other state that they're not looking at is also important.

It often works out, but if you're not looking at the right version then you're risking a merge conflict.

sstanfie1y ago

Needs more exclaimation points!

j / k navigate · click thread line to collapse

42 comments

30 comments · 10 top-level

gatane1y ago· 7 in thread

So if you had to optimize for raw speed, why not choose mutable data?

https://ksvi.mff.cuni.cz/~sefl/papers/zippers.pdf

dsQTbR7Y5mRHnZv1y ago

> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.

Conceptually yes, but the implementation doesn't always necessarily need to work that way under the hood: https://www.roc-lang.org/functional#opportunistic-mutation

1 more reply

KingMob1y ago

> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.

That's not generally true. Many immutable languages are using "persistent" data structures, where "persist" here means that much of the original structure persists in the new one.

For more, see:

- Purely Functional Data Structures by Okasaki: https://www.cs.cmu.edu/~rwh/students/okasaki.pdf - Phil Bagwell's research - e.g., https://infoscience.epfl.ch/record/64398/files/idealhashtree...

munchler1y ago

> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.

sarchertech1y ago

If you really care about performance, iterating over all of those is going to much much slower than iterating over an array.

1 more reply

mrkeen1y ago

Someone should try it with postgres. Make a raw speed branch that gets rid of the overhead of mvcc:

  while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data

  https://www.postgresql.org/docs/7.1/mvcc.html

ahoka1y ago

That’s not exactly how PostgreSQL works. This is true only at certain isolation levels.

cratermoon1y ago

https://dl.acm.org/doi/10.1145/356635.356640

LeftHandPath1y ago· 6 in thread

(Bi)Temporal and immutable tables are especially useful for things like HR, PTO, employee clock activity, etc. Helps keep things auditable and correct.

layer81y ago

pyrale1y ago

> bitemporal schemas are difficult with regard to cross-table references

Who needs more than one table ? >:)

1 more reply

hobs1y ago

Pretty much, you want triggers to store things in a schemaless fashion in an audit format so that you are free to migrate tables.

This does require either knowing the schema at the point in time or recording enough information to do a schema on read.

The other options are of course you basically run a table like an API, always adding, never removing.

refset1y ago

> It's relatively easy to implement your own temporal tables on most existing databases

It gets tricky when you need to change the schema without breaking historical data or queries. SQL databases could do a lot more to make immutability easier and widespread.

jiggawatts1y ago

One fundamental issue I’ve noticed is that typical SQL databases have a single schema per table defining both the logical and physical aspects, typically with a strong correlation between the two.

Databases could treat the columns as the fundamental unit with tables being not much more than a view of a bunch of columns that can change over both space (partitioning) and time (history).

1 more reply

teleforce1y ago

>Actually, explaining that is a good way to introduce the concept of a merkle tree / distributed ledger, and why "blockchain" is specifically for systems without a central authority

This is a very important points, for whatever systems or solutions that you do, do not overengineer and always remember premature optimization is the root of all evil.

People need to wake up and smell the coffee, since ultimately ML/AL it just a tool inside the many tools toolbox.

cowsandmilk1y ago· 2 in thread

hcarvalhoalves1y ago

You can always "redact" previous data. You can treat the sensible entries themselves as mutable, without it breaking the system design around immutable data.

I have also seen a scheme where you store the hash, and have a separate lookup table for sensible data, that you can redact more easily without messing with the log.

mrkeen1y ago

Likewise with database backups.

prydt1y ago· 2 in thread

Seems like a very powerful architecture that is both simple and decouples many concerns.

0823498723498721y ago

Substitute streams for tape i/o, and this description of Samza sounds like it could be very similar to that vision.

* as far as I know, their exposition of the WAL and tradeoffs in its implementation has aged well. Any counter opinions?

gsf_emergency_21y ago

Thanks!

gleenn1y ago· 1 in thread

yencabulator1y ago

Accountants also have trivially simple schemas. (Though lots of complexity elsewhere.)

lbj1y ago· 1 in thread

I have to say, I really love the title :)

cacozen1y ago

I guess “Immutability changes nothing” wouldn’t have the same impact

niuzeta1y ago· 1 in thread

ahoka1y ago

I can recommend Adrian Colyer‘s excellent The Morning Paper blog: https://blog.acolyer.org/

dang1y ago

Immutability Changes Everything (2016) - https://news.ycombinator.com/item?id=27640308 - June 2021 (94 comments)

Immutability Changes Everything - https://news.ycombinator.com/item?id=10953645 - Jan 2016 (4 comments)

Immutability Changes Everything [pdf] - https://news.ycombinator.com/item?id=8955130 - Jan 2015 (25 comments)

(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)

skybrian1y ago

It often works out, but if you're not looking at the right version then you're risking a merge conflict.

sstanfie1y ago

Needs more exclaimation points!

j / k navigate · click thread line to collapse