Erlang was a hair's breadth away from having mutation contained within the actor's variable space, with no external mutation, which for the time would have been quite revolutionary. Certainly Rust's mutation control is much richer, but Rust came a lot later, and at least based on its current compile performance, wasn't even on the table in the late 1990s.
But the sort of understanding of mutability explained in the original post was not generally understood. Immutability was not a brand new concept chronologically, but if you define the newness of a computer science concept as the integration of its usage over time, it was still pretty new by that metric; it had been bouncing around the literature for a long time but there weren't very many programming languages that used it at the time. (And especially if you prorate "languages" by "how easy it is to write practical programs".)
Elixir does a reasonable job of recovering it from the programmer's perspective, but I think an Erlang/BEAM that just embraced mutability within an actor probably would have done incrementally better in the programming language market.
I just cannot make this mental leap for whatever reason.
How does 'directly modify' relate to immutability? (I was sold the lie about using setters in OO a while back, which is also a way to prevent direct modification.)
From a the perspective of "mutability", how dangerous is this Python code?
x = 1
x = 2
print(x)
Normally little snippets like this should be understood as distilled examples of a general trend, but in this case I mean literally three lines. And the answer is, obviously, not at all. At least from the perspective of understanding what is going on. A later programmer reading this probably has questions about why the code is written that way, but the what is well in hand.As the distance between the two assignments scales up, it becomes progressively more difficult to understand the what. Probably everyone who has been in the field for a few years has at some point encountered the Big Ball Of Mud function, that just goes on and on, assigning to this and assigning to that and rewriting variables with wild abandon. Mutability makes the "what" of such functions harder.
Progressing up, consider:
x = [1]
someFunction(x)
print(x)
In Python, the list is mutable; if someFunction appends to it, it will be mutated. Now to understand the "what" of this code you have to follow in to someFunction. In an immutable language you don't. You still need to know what is coming out of it, of course, but you can look at that code and know it prints "[1]".However, this is still at least all in one process. As code scales up, mutation does make things harder to understand, and it can become hard enough to render the entire code base pathologically difficult to understand, but at least it's not as bad as this next thing.
Concurrency is when mutation just blows up and becomes impossible for humans to deal with. Consider:
x = [1]
print(x)
In a concurrent environment where another thread may be mutating x, the answer to the question "what does the print actually print?" is "Well, anything, really." If another thread can reach in and "directly" mutate x, at nondeterministic points in your code's execution, well, my personal assertion is nobody can work that way in practice. How do you work with a programming language where the previous code example could do anything, and it will do it nondeterministically? You can't. You need to do something to contain the mutability.The Erlang solution is, there is literally no way to express one actor reaching in to another actor's space and changing something. In Python, the x was a mutable reference that could be passed around to multiple threads, and they all could take a crack at mutating it, and they'd all see each other's mutations. In languages with pointers, you can do that by sharing pointers; every thread with a pointer has the ability to write through the pointer and the result is visible to all users. There's no way to do that in Erlang. You can't express "here's the address of this integer" or "here's a reference to this integer" or anything like that. You can only send concrete terms between actors.
Erlang pairs this with all values being immutable. (Elixir, sitting on top of BEAM, also has immutable values, they just allow rebinding variables to soften the inconvenience, but under the hood, everything's still immutable.) But this is overkill. It would be fine for an Erlang actor to be able to do the equivalent of the first example I wrote, as long as nobody else could come in and change the variable unexpectedly before the print runs. Erlang actors tend to end up being relatively small, too, so it isn't even all that hard to avoid having thousands of variables in a single context. A lot of Erlang actors have a dozen or two variables tops, being modified in very stereotypical manners through the gen_* interfaces, so having in-actor truly mutable variables would probably have made the language generally easier to understand and code in.
In the case of OO, the "direct mutation" problem is related to the fact that you don't have these actor barriers within the system, so as a system scales up, this thing "way over there" can end up modifying an object's value, and it becomes very difficult over time to deal with the fact that when you operate that way, the responsibility for maintaining the properties of an object is distributed over the entire program. Technically, though, I wouldn't necessarily chalk this up to "mutability"; even in an immutable environment distributing responsibility for maintaining an object's properties over the entire program is both possible and a bad idea. You can well-encapsulated mutation-based objects and poorly-encapsulated immutable values. I'd concede the latter is harder than the former, as the affordances of an imperative system seems to beg you to make that mistake, but it's certainly possible to accidentally distribute responsibilities incorrectly in an immutable system; immutability is certainly not a superset of encapsulation or anything like that. So I'd class that as part of what I mentioned in this post before I mentioned concurrency. The sheer size of a complex mutation-based program can make it too hard to track what is happening where and why.
Once you get used to writing idiomatic Erlang programs, you contain that complexity by writing focused actors. This is more feasible than anyone who hasn't tried thinks, and is one of the big lessons of Erlang that anyone could stand to learn. It is then also relatively easy to take this lesson back to your other programming languages and start writing more self-contained things, either actors running in their own thread, or even "actors" that don't get their own thread but still are much more isolated and don't run on the assumption that they can reach out and directly mutate other things willy-nilly. It can be learned as a lesson on its own, but I think one of the reasons that learning a number of languages to some fluency is helpful is that these sorts of lessons can be learned much more quickly when you work in a language that forces you to work in some way you're not used to.
The way you deal with that is, you have the object defend the consistency of the data that it controls. You have some kind of a mutex so that, when some thread is messing with certain data, no other thread can execute functions that mess with that data. They have to wait until the first thread is done, and then they can proceed to do their own operations on that data.
This has the advantage that it puts the data and the protection for the data in the same place. Something "way over there" can still call the function, but it will block until it's safe for it to modify the data.
(You don't put semaphores around all data. You think carefully about which data can be changed by multiple threads, and what consistency relationships that could violate, and you put them where you need to.)
Is that better or worse than Erlang's approach? Both, probably, depending on the details of what you're doing.
Right, the OO guys said to use "encapsulation" rather than direct mutation.
> so as a system scales up, this thing "way over there" can end up modifying an object's value
Can you not send a message way over there?
However, interior immutability is not without its benefits. It enables a very simple GC. GC is easily done per-actor because each actor has independent, exclusive, access to its own memory. But the per-actor GC is very simple because all references are necessarily backwards in time, because there's no way to update a reference. With this, it's very simple to make a copying GC that copies any active references in order; there's no need for loop checking, because loops are structurally impossible.
I don't know that this was the intent of requiring immutability, but it's a nice result that pops out. Today, maybe you could pull in an advanced GC from somewhere else that already successfully manages mutable data, but these were not always available.
Of course, it should be noted that BEAM isn't entirely immutable. Sometimes it mutates things when it knows it can get away with it; I believe tuples can be updated in some circumstances when it's clear the old tuple would not be used after the new one is created. The process dictionary is direct mutable data. And BIFs, NIFs, and drivers aren't held to strict immutability rules either, ets has interior mutability, for example.
Mutability is relative to the layer you're looking at. BEAM is, of course, completely mutable from top to bottom because it is constantly mutating RAM, except, of course, that's not really a helpful way of looking at it, because at the layer of abstraction you program at values are immutable. Mutatable programs can be written in terms of immutable abstractions with a well-known at-most O(n log n) penalty, and immutable programs can be written on a mutable substrate by being very careful never to visibly violate the abstraction of immutability, which is good since there is (effectively for the purposes of this conversation) no such thing as "immutable RAM". (That is, yes, I'm aware of WORM as a category of storage, but it's not what this conversation is about.)
Norvig's secret sauce is having 3 different projections of the same data, and that involves making updates in one part of a graph that are visible from three entry points.
I'm sure there are other solutions but mine didn't settle down until I started treating the three views as 3 sets of cell coordinates instead of lists of cells.
[0] http://venge.net/graydon/talks/intro-talk-2.pdf
[1] https://rust-lang.github.io/rfcs/0230-remove-runtime.html
Thanks for the links!