The language was designed around our I ideas of what the problem really was and the best way of solving it. The massive and extremely lightweight concurrency were a critical part of attacking the problem which was/is extremely concurrent. There are an enormous number of things going on in a switch which have to be handled concurrently, sometime over 10k calls plus running the switch. So the concurrency was fundamental. The error handling primitives were of course also a a critical part of the solution as fault tolerance was a critical part of the problem.
A lot of effort went in to working out what the concurrency and error handling should look like to be right for how to attack the issues and solve the problems. Fortunately we worked with a user group looking at designing a new architecture who tested using our ideas and came with extremely important feedback on the ideas, what was good or bad or just not necessary. They then became the first group to build a product using Erlang. It didn't use OTP as it didn't exist then.
OTP came later and is “just” a formalised, generic wrapper around these ideas. We had client-servers, state machines and the equivalent to supervisors in products before OTP was developed. Behaviours came with OTP. And in the beginning there were many different versions of OTP alive at the same time.
Behaviours could not have been developed as they are without lightweight processes and communication. OTP is of course fundamnetal to the distribution and use of the language as it does encapsulate most of of our original ideas in how Erlang should/could be used to build systems.
You can implement those same patterns, if you also have lightweight processes, messages, and message queues. There's a Rust library that aims to do exactly that. The new WASM runtime (Firefly, formerly known as Lumen) aims to bring those patterns into the WASM ecosystem, potentially using this kind of concurrency pattern client-side.
What's potentially novel is adding in newer concurrency patterns that are in use in the Kubernetes community -- use of selectors, services, scaling with replicasets -- so that, instead of a static supervision tree, it can be a dynamic supervision tree, spread out across the entire pool, even as that pool expands or shrinks.
Armed with those successes I turned to thinking about applying something similar to a production Node app, and every time I think about it I realize what I really want is OTP. Each type of processing we do has its own caches, and its own CPU uses, which could dovetail well with worker threads, but I have a chicken and egg problem with trying to move any of that work out of process because I don't get any benefit until 1) none of the workers can silently crash and leave the application nonfunctional and 2) before I'm handling half a dozen concurrent requests per Node process instead of one or two. and 3) before message passing overhead is cheaper
It's either too late or too early to rewrite this whole thing in Elixir.
No problem launching a 100k actors on a laptop.
Edit: Thanks for the additional context, dangoor. Cheers.
OTP seems like a pretty overloaded acronym—what does it mean here?
Indeed, this didn't come as a surprise because I've heard Erlang folks rave about OTP. What was really nice about this article (complemented by some of the comments here) was that it gave me a really nice intuition for what OTP really is, beyond "supervisor trees, y'know".
I will keep looking. Keep in mind, BEAM uses a preemptive actor rather than async, so it avoids the “storms” that can happen when an async reactor runs out of resources. EDIT: but I guess Tokio is capable of preemption through work-stealing too?
For example, if you recreated gen_server in a cooperative concurrency environment, one gen_server could use up all of the CPU and have an impact on the performance of the rest of the system. Maybe the other threads (microthreads, not OS threads) would still respond in <500ms, but if every request takes 500ms when they normally take 15ms you could essentially have outage-like conditions, particularly with upstream timeouts.
Instead, because BEAM is preemptive that one (or 10) hung gen_server doesn't hang up everything else on a node. Sure at some point performance will degrade, but that point is much further down the line than in cooperative concurrency models. There was a fantastic talk by Sasa Juric that demonstrates this in Erlang. [1] Otherwise you run a higher risk of even the supervisors being starved for CPU time, particularly if you are launching hundreds of processes that are all locking the CPU.
It's really the combination of the behaviors (OTP), the scheduler, lightweight threads, message passing, and immutability to me that makes the Erlang (Elixir for me) concurrency model so appealing.
Creating a language with the feel of a lisp, the environment of Smalltalk, and the concurrency of Erlang has been my dream for a long time.
Hey! Now there’s two of us! It would be interesting to compare notes. :)
After 20 years of really dedicated Smalltalk evangelism, I jumped out of that balloon a little over 10 years ago. Then I wandered in many strange lands embracing the polyglots “right tool for the right job” mantra. What I found was more like “here’s a lot of really mediocre tools; rarely is it crystal clear which if many is right for the job.”
A year or so ago, I built out an API server in Elixir (no Phoenix) and I’ve really loved it. Great community. I love that it’s built on basic fundamental principals, and not a bunch of edge cases. I’ve always wondered what some sort of mashup would look like. If I was independently wealthy I would tinker away at such a thing.
I've not really thought about it beyond that it would be nice. I really wanted to love Smalltalk, but Pharo at least didn't run well on Arch for me, and I didn't much love the web frameworks/concurrency story. I will say that Seaside is probably what got me interested in Liveview in the first place though. So many "if only"s...the developer experience is amazing in Smalltalk though.
There's a lot that scares me about making languages and I've not studied it much. I've started reading through SICP though as a starting place. Writing a basic scheme interpreter (and future compiler/JIT compiler) as an image-based language (with change tracking) to make it interactive from the get-go seemed to be as far as I made it, maybe basing it on an existing Scheme, but tacking on preemptive scheduling sounds hard. But so is making a language and entire development environment. :)
Shoot me an email though if you'd like to chat (in my profile, also username@gmail.com), not 100% sold if this is something I'd want to take on in the future or not, I really wish it already existed and someone better than me had already made it. :-D
Have you looked at LFE (Lisp Flavored Erlang), by one of Erlang's co-creators, Robert Virding? (No Smalltalk-like environment, but 2 out of 3 ain't bad, right? :-) )
>Hey! Now there’s two of us! It would be interesting to compare notes. :)
Please make this happen. :)
I'm trying to eventually accomplish something like this: https://github.com/sin-ack/zigself
It's an implementation of the Self programming language in Zig, with an actor model inspired by Erlang.
The main thing to realize is that Lisp and Smalltalk are very much symmetrical in terms of structure. There is no real distinction between the two other than syntax and basic computation unit (closures vs. objects). And even closures can be used as objects and vice versa.
That only leaves the concurrency model. I have a basic implementation of actors using objects as the "context". It still has a long way to go to reach the supervisor tree model of Erlang, but interestingly enough, the ideas in the article are reflected here heavily; behaviorism is at the core of Self.
For Erlang, yes. For implementing behaviours (the point of my post), I don't think so (I sketch a "single threaded" solution towards the end).
I think one worker behaviour per CPU/core per stage in the processing pipeline is better than throwing thousands of processes at the problem and let the scheduler deal with it. This is what I got from Martin Thompson's talks (I linked to one of them in the "see also" section).
https://github.com/samsquire/preemptible-thread
It is a 1:M:N scheduler where there is one scheduler thread, M kernel threads and N lightweight threads. I take advantage that loop indexes can be structures and can be modified by other threads. So we can set the thread's looping variable to the limit to end the current loop and pause it and then schedule another thread.
[1] https://go.dev/src/runtime/preempt.go
[2] https://www.erlang.org/doc/man/re.html#:~:text=run/3%20alway....
I can't find the original announcement, but I found the accepted Go proposal: https://github.com/golang/go/issues/24543
I remember that being the straw that made me drop Elixir (which I love, to be clear). Go excels in many, many places where Elixir does, but it's way faster.
I do think Elixir has immutability on their side, which is huge for new developers, but there are way less developers in Elixir than Go, so the end result doesn't change unfortunately.
I found this a really interesting read, but this stuck out because it doesn't jive with my mental model of gen_server.
gen_server is fully serialized. Even the code underpinning it is not concurrent.
Now I guess gen_server does expose some top level functions to simplify sending/receiving a message into the process, but the process itself is serial.
And this is part of the genius of gen_server to me. You don't need to think about your state concurrently because it processes a single message at a time. Your system executes concurrently, but the individual components do not.
Maybe that is what the post means and I misinterpreted it.
His point was that the paradigm is so different at the VM level, that many things become irrelevant to the conversation. That said, there still are concurrent programming challenges on the BEAM, but it's very minimal compared to languages where it's not baked in.
Ruby Ractor is a good example of how a VM-backed concurrency mechanism will likely change how programs in that language can be built.
2. The application programmer writes sequential code, all concurrency is hidden away in the behaviour;
4. Easier for new team members to get started: business logic is sequential, similar structure that they might have seen before elsewhere;
gen_server is useful, but it's not much more than receive in a tail recursive loop and conventions around messages, most specifically about if the sender expects to receive a reply or not. It's not magic, and it's written in clear Erlang that anyone with a week of Erlang fiddling can understand (which kind of is magic; almost everything in OTP is clearly readable)
Concurrency comes from running many processes each with their own message queue.
I think it's even more nuanced, but doesn't really matter for this type of post.
(I am actively writing a chapter exploring this topic, so is has been top of mind.)
It is usually fine, for parallelism just use a pool of gen_servers or many gen_server processes that would map to your data model well (example, one gen_server for each network socket).
Though, these properties of gen_server come from the fact it is just a single erlang process.
I think architecting an Erlang system can be kind of tricky because you do have to think about all these processes sort of co-existing at the same time and how they interact. With one system I was involved with, we weren't really satisfied with how we'd divided things up among different behaviors, and did some refactoring. It wasn't difficult, but it did require a different way of thinking about things. When we had it all working, I was really satisfied with the results though. That thing was robust, and quite resilient.
I miss working with Erlang. It's tricky to find people who are using it or Elixir in the sweet spot, IMO. Lots of "it looked cool and we wanted to play around with it" out there, as well as some people thinking it'll magically make their systems "internet scale".
My best experiences with it have been semi-embedded systems where it's not doing too much distributed computing, but where "robust" and "predictable" are important qualities.
https://www.merriam-webster.com/words-at-play/jive-jibe-gibe
"... it seems possible that this use of jive will increase in the future, and if it does dictionaries will likely add it to the definition."
Language evolves.
All the other things flow from that thesis and understanding. You can recreate behaviours described in the repo doc using Erlang primitives very easily, and they are very hard to recreate in pretty much any other language.
Because Erlang is very much literally about lightweight processes and message passing. Only:
- every part of the system knows about lightweight processes and messages
- every part of the system is engineered around them
- the underlying VM is not only re-entrant in almost every single function, it's also extremely resilient, and can almost guarantee that when something dies, only that process dies, and the rest of the system receives a guaranteed notification that this process dies
There are more, but without at least that you can't really recreate Erlang's standard behaviours. For example, you can't recreate this in Go or Rust because `panic` will kill your entire program unconditionally.
I read about Erlang on /r/programming, which was having a new fad for a new shiny language, as was the custom back then.
And I desperately followed all the /r/programming fads because I was worried that I'd end up irrelevant if I wasn't skilled up in the latest Haskell web framework.
But one of those fads was Erlang, and it intrigued me so much, that I ended up printing Mr Armstrong's thesis, stuck it in a manila folder, and read it, slowly, cover to cover while waiting for, or riding on, my bus to my contact centre job, and I've still got it in my bookcase.
His thinking on resiliency in the face of inevitable failures, and on safe concurrency, has shaped my thinking, and proved invaluable repeatedly and is very relevant today. It's like their phone switches were distributed microservices running in a container orchestrator, before it was cool.
I am now inspired to do the same, thank you!
A (worker) thread dying isn't an issue in Rust, Go, C# and etc. Sure, each goes about error handling in a slightly different way either opt-in or opt-out or enforced (when unwrapping Error<T, E>) but other than that the advantages of Erlang/Elixir have faded over time because the industry has caught up.
p.s.: C# has not one but two re-entrancy syntax options - 'async/await' and 'IEnumerable<T>/yield return'. You can use latter to conveniently implement state machines.
It really hasn't. People fixate on the idea of just running some processes, and just catching some errors.
And yet, non of the languages that "solved this" can give you Erlang's supervision trees that are built on Erlang and Erlang VM's basic functionality. Well, Akka tried, and re-implemented half of Erlang in the process :)
But other advantages did fade: multi-machine configurations are solved by kubernetes. And it no longer matters that you can orchestrate multiple processes doing something when even CI/CD now looks like "download a hundred docker containers for even the smallest off tasks and execute those".
> p.s.: C# has not one but two re-entrancy syntax options - 'async/await' and 'IEnumerable<T>/yield return'.
What I meant by re-entrancy in the VM is this:
Every process in Erlang gets a certain number of reductions. Where a reduction is a function call, or a message passed. Every time a function is called or a message is passed, the reduction count for that process is reduced by one. Once it reaches zero, the process is paused, and a different process running on the same scheduler is executed. Once that reaches zero, the next one is executed etc.
Once all processes are executed the reduction counter is reset, the process is woken up and resumed.
On top of that, if a process waits for a message, it can be paused indefinitely long, and resumed only when the message it waits for arrives.
So, all functions that this process executes have to be re-entrant. And it doesn't mean just the functions written in Erlang itself. It means all functions, including the ones in the VM: I/O (disk, network), error handling, date handling, regexps... You name it.
Erlang (and by extension Elixir) has the advantage that the programmer can think at a higher level about their system. You don't have to write or configure a scheduler. You don't have to invent supervision trees. You can be sure that the concurrently-running parts of your system cannot possibly affect each other's memory footprint (though Rust gives a robust answer to this problem as well).
It doesn't make a perfect fit for every problem, but there is still a decent-sized space of problems -- I'd say "highly concurrent, but not highly parallel" -- where Erlang gives the programmer a headstart.
I don't use Erlang but my understanding is that while it is not exactly fully pre-emptive, there are safeguards in place to ensure process fairness without developer foresight.
If your Rust thread panics while it holds a Mutex, you've got a bit of a mess. Especially if it was halfway through updating shared mutable state. Probably similar in Go or C#, but I haven't used Go and only did cargo cult programming in C#, I didn't read any sources or see warnings about crashing in threads or async/await.
Go channels are nice but they don’t come close to Erlang message passing. In go you can’t just ignore if the channel is bounded or unbounded, open or closed. Writing to a closed channel with blow you up. It takes some time to learn it. Messages in Erl are easy fifo serial execution.
If you're willing to make your "lightweight processes" OS threads you could kind of make it work. E.g. Rust gives you both panic hooks (to notify everyone else that you died) and catch_unwind to contain the effect of a panic (which generally stops at a thread boundary anyways). But of course that only scales to a couple hundred or thousand threads, so you probably have to sacrifice a lot of granularity.
And any library that links to C/C++ code has the potential to bring the whole process down (unless you make your "lightweight processes" just "OS processes", but that just makes the scaling problems worse)
It’s explained in many documents about lightweight processes, of course for elixir/erlang/beam but also for Go and Crystal and even going back to Solaris Internals and modern Project Loom for upcoming JVM situations
You can recover() panic() just fine in Go.
The nuance is that most languages let you build reliable programs _if your code is correct_ - if you're using defers, context handlers, finalizer, cleaning up state in shared data structure, etc.
Erlang's goal is to be reliable "in the presence of software errors", that is, even if your code is buggy. If a request handler process dies inside an Erlang web app, whatever file or socket it opened, memory it allocated, shared resource it acquired (e.g. a DB connection) will be reclaimed. This is true without having to write any error handling in your code.
The way it's done is that the VM handles the basic stuff like memory and files, and it provides signals that libraries (like a DB connection pool for instance) can use to know if a process fails and clean up as needed. In other words the process that fails is not responsible for its own clean up.
At some point some code must of course be correct for this to work. Like, if the DB connection pool library doesn't monitor processes that borrow connections, it could leak the connection when such a process dies. But the point is that this part (the "error kernel") can be a small, well-tested part of the overall system; whereas in a classic program, the entire codebase has to handle errors correctly to guarantee overall stability.
It's very hard to build Erlang-like versions of supervision trees using those tools.
You can catch_unwind() panic!() in Rust too :-).
"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang." - Virding's Law ;)
So first you have to implement all those things. And then use them.
I mean, you can write erlang-way in almost any language, just in this case you need to adopt all the libraries to follow the same principles.
Some languages implement similar error handling strategy by just creating a separate process per request (hello, php). We know how to clean after a worker dies (just let the worker die). Just in that case supervisor strategy is very simple.
Pure magic, that can't be recreated in any regular language :)
In terms of being fault-tolerant I think the modern approach with (micro)services is quite similar, one can have multiple services running and communicating using something like protobuf, having restart strategies, fallbacks and so on. From my experience Erlang doesn't offer any killer features in this case, does it?
As a developer I think it's easier to think in terms of how to send data to hadoop, sqs queue etc for processing and read results later than keep in mind the supervision tree, messages, mailbox size, linking and so on. And the "processing" side can be as well implemented in Erlang just I don't feel Erlang's features are needed in the "top level development" and create more problems and barriers.
"the whole team working on Erlang quit and started their own company."
The same event as described in "A history of Erlang": "In December, most of the group that created Erlang resigned
from Ericsson and started a new company called Bluetail AB."
https://www.labouseur.com/courses/erlang/history-of-erlang-armstrong.pdf
'most of the group that _created_ Erlang' is not the same thing as 'the whole team _working_ on Erlang'. Or, quantifying it, at the end of the 'history' paper, there's a list of 45 people under 'implementation' and 'tools'. Around 35 were at Ericsson in 1998. Of those, nine or ten quit to form Bluetail, and another two or three left for Bluetail later on.(In 1998, there were two connected groups working on Erlang in the same building in Älvsjö, Stockholm. One was the computer science laboratory (CSLAB), where Erlang was created, the other was "Open Systems", which had more of a development role. A significant part of CSLAB left. Almost all of 'Open Systems' stayed. Many that stayed were already doing a stellar job on Erlang and many still are.)
What does `gen` mean in `gen_server` ?
Thanks for replies for "Generic". So this article is not well organized in that case. With some simple naming explanation, it should be obvious to guess what the system does.
Abbreviation doesn't save the writings.
Some more examples, in Ruby, instead of calling `implement Module`, it uses `include Module` . I do think include is not as clear as implement.
Recently, I was adding a similar abstraction to Rust on a project I'm working on and I called it `AbstractProcess`. Like, it is some kinda of template for a process. Still don't think it's clear what it does by just looking at the name. Does anyone have a better idea on how to name such a pattern?
Agreed, though I do think the `include` naming might just be a relic from a time before `prepend` and `extend` also existed in Ruby. I find it a lot more intuitive when I remember it as `append`ing to the ancestors, and the language even sort of calls it that internally with the naming mismatch between `include` and `append_features`: https://ruby-doc.org/3.2.0/Module.html#method-i-include
Then one can see that's exactly what it does to a Module's ancestor chain:
irb(main):001:0> RUBY_VERSION => "3.2.0"
irb(main):002:0> lol = ::Module::new => #<Module:0x00007f2969821740>
irb(main):003:0> lol.ancestors => [#<Module:0x00007f2969821740>]
irb(main):004:0> lol.singleton_class.ancestors => [#<Class:#<Module:0x00007f2969821740>>, Module, Object, PP::ObjectMixin, Kernel, BasicObject]
irb(main):005:0> rofl = ::Module::new => #<Module:0x00007f296982dfe0>
irb(main):006:0> lmao = ::Module::new => #<Module:0x00007f296982fd40>
irb(main):007:0> omg = ::Module::new => #<Module:0x00007f2969822a00>
irb(main):008:0> lol.include(rofl) => #<Module:0x00007f2969821740>
irb(main):009:0> lol.prepend(lmao) => #<Module:0x00007f2969821740>
irb(main):010:0> lol.extend(omg) => #<Module:0x00007f2969821740>
irb(main):011:0> lol.ancestors => [#<Module:0x00007f296982fd40>, #<Module:0x00007f2969821740>, #<Module:0x00007f296982dfe0>]
irb(main):012:0> lol.singleton_class.ancestors => [#<Class:#<Module:0x00007f2969821740>>, #<Module:0x00007f2969822a00>, Module, Object, PP::ObjectMixin, Kernel, BasicObject]While (as a sibling comment notes) “append” might be better than “include” given other terms used in Ruby, “implement” would be completely wrong. Modules aren’t interfaces; in fact, they are almost exactly the opposite. A class in a language with interfaces declares that it “implements” an interface because the interface provides guarantees, and the class provides an implementation of those guarantees. An included Ruby module provides implementation, not guarantees that the class provides implementations for.
“include” describes what it does much better than “implement”.
Yeah, until the business problem itself involves inherent concurrency, which usually happens much faster than people think. Or until I, as the non-expert, want to dig in to make changes or debug a problem.
This distinction into "expert" and "lowlife (SCNR) using the expert abstractions" is really one that doesn't hold in practice most of the time.
I think it's much better to embrace that concurrency is a cross-cutting-concern and reality is that it can happen on any level, so the language should better support reality.
Seems to me all the tools are available to accomplish this effectively now days.
Good luck on your project, in any case.
Yes, a large part of programming will be just a special case of doing mathematics.
I admire your ambition!
https://GitHub.com/samsquire/algebralang
It's based on the idea there are relations between variables and every function is a concurrent process.
Does anyone know why?
> In February 1998 Erlang was banned for new product development within Ericsson—the main reason for the ban was that Ericsson wanted to be a consumer of sodware technologies rather than a producer.
From Bjarne Däcker's thesis (2000, p. 37):
> In February 1998, Erlang was banned within Ericsson Radio AB (ERA) for new product projects aimed for external customers because: > > “The selection of an implementation language implies a more long-term commitment than selection of processors and OS, due to the longer life cycle of implemented products. Use of a proprietary language, implies a continued effort to maintain and further develop the support and the development environment. It further implies that we cannot easily benefit from, and find synergy with, the evolution following the large scale deployment of globally used languages.” [Ri98]
"In February 1998, Ericsson Radio Systems banned the in-house use of Erlang for new products, citing a preference for non-proprietary languages. The ban caused Armstrong and others to make plans to leave Ericsson. In March 1998 Ericsson announced the AXD301 switch, containing over a million lines of Erlang and reported to achieve a high availability of nine "9"s. In December 1998, the implementation of Erlang was open-sourced and most of the Erlang team resigned to form a new company Bluetail AB. Ericsson eventually relaxed the ban and re-hired Armstrong in 2004."
Not wanting to rely on a fairly esoteric in-house language makes some sense.
Since then things have changed significantly of course.
[0] https://web.archive.org/web/20170829230730/https://www.erics...
Even as complex as C++ is, I can train a great programmer C++ a lot faster than I can train a future great junior C++ programmer to be a great programmer. Yes you will encounter the rough edges of whatever language often in the first 5 years, but a great programmer will be great in any language quickly. You need one expert in the language on the team for the weird complex stuff, but most code isn't that complex.
So, you can probably create erlang behaviours in any language, but it requires building a framework and then training every developer in how to use it. There is probably value in having a standard version of erlang behaviours for a range of different languages.
Oh no :P
Very interesting read though, thanks for sharing!
I heard Ziggy still works there.
He can’t have hated it that much, then. ;)
You can implement actor behaviours in go or node or even c, but without that lower level support it will never give you the stability guarantees that Erlang process isolation is giving.
To draw a weird comparison Elixir (with Erlang process isolation) brings two world together. First it's a PHP/Ruby level of fire-and-forget productivity because each http request is handled in an independent isolated process, which if it crashes won't affect the system, but instead provide automatically a nicely debuggable crash log. And second it provides natively all distributed tools for long-lived systems. E.g. PubSub , Sessions and database connections don't have to be rebuilt like in ruby/PHP on a per request basis but can be their own long-lived processes.
If there would be a library that could bring this easy to use process isolation+communication e.g. to C programming it would be a game changer. But the best you get in all other languages I'm aware of is to use actual process isolation (fork multiple node/ruby/go processes) and then use some kind of IPC manually or redis or k8s...
In the Behaviors section you talk about behaviors being similar to interfaces in Go and give a Go example. But then you switch examples for Erlang. Maybe show the Joe/Mike example written in Erlang and then say, here's a more complicated example (the key-value example) that really describes behaviors better.
In the light of this statement, the answer to what I think is the thesis question of that entire piece:
"This begs the question: why aren't language and library designers stealing the structure behind Erlang's behaviours, rather than copying the ideas of lightweight processes and message passing?"
Is that while Erlang has a lot of good goals, the results of how they got there are simply not the state of the art. Or, to put it another way, language designers are not copying Erlang, and they are correct to not copy Erlang.
I respect Erlang a lot. They were a good 10-15 years ahead of their time. However, you will note that if you add 10-15 years to the creation date of Erlang, you still end up in the past. If Erlang were to come out today, fresh, nobody had seen it before, into an otherwise identical programming language environment, I would say it makes several mistakes.
One I've written about before is that Erlang is a non-Algol language for no reason: https://news.ycombinator.com/item?id=7277957 (Note the library mentioned in that post, suture, is now mature, and I use it all the time. It works for what I need.) But in the context of this post, that's less critical.
The other major mistake I'd say it made if it came out in 2023 is that it is a totalizing environment. By that I mean that it has this built in implicit assumption that it is responsible for all the reliability in the system, and you don't get Erlang's features very conveniently if you don't use it as the complete control backplane for your entire system. You run an Erlang cluster, and it bundles all the message passing, restarting, reliability, cluster management, software deploy, and everything into one system.
But for the world we live in today, that's a price not worth paying. We don't need the Erlang message bus to be the only message bus. The Erlang message bus is, frankly, not very good, and it's actively terrible if you want to use it for one non-Erlang process to communicate to another. We don't need the Erlang message bus. We have a dozen message busses, some in the cloud, some commercial, some open source, some that double as retention (Kafka), all of which scale better, none of which tie you to Erlang's funky data types.
And then, within the matrix of those message busses, we don't need Erlang's restart capability. We have an abundance of ways to restart processes, from systemd, to kubernetes, to any number of other ways.
We don't need Erlang clusters for redundancy any more. You just run multiple copies of a service against the message bus, on multiple systems for redundancy.
We don't need Erlang's behaviors. We have interfaces, traits, object orientation, and even just pushing that entire problem up to the OS process level, or writing a cloud function, and any number of ways of achieving the same goal.
Erlang's software deploy is interesting, but we have a lot of options for it. The whole attempt to do live updates is interesting, but it also imposed a lot of constraints that systems that don't have that need, which is the vast majority of them, don't need or want. This is perhaps the space where the state of the art isn't that far ahead of Erlang. It's still a mess, despite all the churn in this space. But even so, with all the options available, you can probably find something better for your system than the Erlang way of upgrading software, even if it isn't necessarily much easier.
The cognitive hazard that Erlang presents the community in 2023 is that it has some very good writing on the topic of reliability and its other goals, and then, naturally, one segues into the discussion of how Erlang solved the problem. And it was a very interesting solution for the time. I used Erlang for many, many years back when it was effectively the only solution to these problems.
But it isn't the only solution anymore. The space has exploded with options. Unsurprisingly, the ones that a highly innovative pioneer tried out first are not the best, or the only. They chose well. Let me again emphasize my respect for the project. But it's not on the cutting edge anymore.
Granted, the diversity of options does mean the real world has gotten quite chaotic, where you may have three message busses attaching systems implemented in a dozen different languages, but that's something you can take up with Conway's Law. Erlang couldn't work with Conway's Law without totally converting your entire company to it, which just isn't going to happen.
The reason why language designers aren't rushing to copy Erlang is that what was excellent and amazing in 2000 (and, again let me underline, I mean that very seriously, it was a cutting edge platform built with a lot of vision and moxie) is, in 2023, mediocre. Erlang is a mediocre language (Elixir is at least "good", Erlang is mediocre), attached to a mediocre message bus, with a type system that doesn't even reach mediocre, with a mediocre totalizing approach to system design where there's a very significant impedence mismatch between it and the rest of the world, with an at-par-at-best VM (I won't call that mediocre, but where it used to be head-and-shoulders above everything else in certain ways, it is now merely competitive), with mediocre standard libraries, and a mediocre product fit to its own stated goals. It just isn't the best any more.
The state of the art right now is super chaotic. I can hardly get two systems deployed on the same infrastructure any more, because there's always some reason something in that list has changed. But when the chaos settles and best practices emerge, something that I'd say is at least a good 5 years away, the result will clearly have Erlang inspiration in it for sure... but it won't look a lot like Erlang on the surface.
What is worth copying has largely been copied. It doesn't look exactly like Erlang, but this turns out to be a good thing.
I agree with pretty much all of your comment (which clearly comes from a place of deep experience), but the thing that keeps bringing me back to the ideas of Erlang--potentially trying in vain to implement similar concepts in the languages I actually work in (including developing a way to manage fibers in C++ coroutines that work similar to Erlang processes so I could debug background behaviors)--is the idea that these restartable and isolated units simply aren't large enough to be managed by the operating system and systemd or kubernetes of all things: they are things like individual user connections. While there are plentiful easy ways to do shared-nothing concurrency in the world attached to virtually every software project and framework these days, they are all orders of magnitude more expensive than what Erlang was doing, even with its silly little kind of inefficient VM.
Yes, today we have an entire collection of ecosystems of services that can provide the key functionalities as described above. Each of these technologies comes with it's own long tail of dependencies, security issues, maintenance effort, plain old computational overhead, etc.
Meanwhile, this 30-year old technology provides matching functionality (yes, admittedly with syntax and object types that simultaneously induce vertigo and motion sickness), but all the bugs have long been eradicated or encased in amber, and pound-for-pound it will rip circles around an alternative solution that's dragging a Java VM or a megaton of node_modules along with it, wrapped in docker images and k8s yamls.
I use yaws as my go-to webserver. It's nuke-proof. It's simple*, and it Just Works. Good luck finding a haxxor that can breach it. I believe that it's in large part due to the very simple conceptual building blocks it's constructed out of (the gen_ behaviours OP describes).
Yep, if you want to write a new concurrent system, you will move much faster living inside this integrated environment.
And of course vertical scaling is easier than horizontal. It'll be a long time before you outgrow a huge server.
And if it's not, that's a nice problem to have. So you split off parts of the app into other servcies and erlang/elixir is wonderful at communicating with/controlling other network addressable services.
The problem with erlang is that it's both harder to get started and has a lower ceiling than some other languages. But there's a huge middle class of software that would really benefit from it if they got over the initial hump.
The thing about Erlang is that you never needed clusters at all. The redundancy was built into each instances with the runtime. When you build that way, everything naturally scales out horizontally with additional processors and/or physical nodes.
You can't do that in any other language without building the entire system for it from the ground up.
Using multiple systems for redundancy means counting on the entire system going down. The Erlang way isolates this impact to 1 of potentially millions of parallel actions on the system itself. Using other systems for redundancy, the other million actions in progress on this server go down with it. The difference in the level of redundancy is significant.
But I do agree with you that we don't need it for most systems because most systems simply aren't that complex. The benefit of the BEAM comes from simplifying complexity, which tends to evolve over time. Elixir, Phoenix and LiveView will likely lead to earlier adoption of the BEAM in projects before the complexity ramps up which will show a long term benefit.
IIRC, if you read the original thesis, the reason for clusters is just that there's always that chance an entire machine will go down, so if you want high reliability, you have no choice but to have a second one.
The OP is correct in that the key to understanding every design decision in Erlang is to look at it through the lens of reliability. It also helps to think about it in terms of phone switches, where the time horizon for reliability is in milliseconds. I am responsible for many "reliable" systems that have a high need for reliability, but not quite on that granularity. A few seconds pause, or the need for a client to potentially re-issue a request, is not as critical as missing milliseconds in a phone call.
In such cases, I would agree with part of your criticisms because it is indeed the wrong tool for the job. Erlang was not designed to solve this problem: the serialization format is centered around Erlang. The distribution messages reflect the semantics of processes, messaging, monitoring, etc. Inter communication is not the focus. Even on its early days, the distribution was used to provide tolerance against hardware failures by running two identical systems. So I find comparing Kafka and Erlang to be an apples to oranges scenario.
In my opinion, Erlang shines for building homogeneous systems: multiple instances of the same application running in a cluster. Precisely because all I need is Erlang. It comes with a unified programming model for both local and distributed execution. Look at Phoenix uses it to provide features such as distributed pubsub and presence out-of-the-box, features which either require external tools - and additional complexity - or simply do not exist in other platforms. And the beauty in designing such solutions is that you start with the concurrent version and then naturally evolve into making it distributed (if necessary).
I also find the comparison equally misses the mark between restarts/fault-tolerance and Kubernetes. Because, once again, they mostly work at different levels. The classical example is using supervisors to model database connections, something you simply cannot delegate to k8s. But a more recent example comes from working on Nx, which communicates to the GPU. You can stream data in and out of the GPU, but what happens when the code streaming data errors out? You need to develop a synchronization mechanism to make sure the GPU does not get stuck. And what happens if the synchronization mechanism fails? With Erlang I can model and test all of those failure scenarios quite elegantly. Perhaps there are better approaches lurking out there, but it certainly isn't k8s.
When it comes to k8s, they mostly complement each other. Erlang tool for restarting _machines_ is basically non-existent (there is -heart or the stand-by system described by Joe) and k8s addresses that. Erlang doesn't have service discovery, k8s covers that gap. But, even then, there is no assumption you must use Erlang clustering. It is opt-in, you don't have to use it, and in the "worst case" scenario, you can deploy Erlang just as any other language.
That was my biggest point, but you posted a long comment with a few other disagreeable points, too:
1. re:Conway's Law : Just write the code in Erlang (read: Elixir in this day and age) to begin with. Rewriting large applications in new languages is rarely worth it anyways, that's not a failure of the better language you want to switch to, it's a failure of the poor choice of language you started with.
2. re: But it's not on the cutting edge anymore: And yet it is. Erlang and Elixir are the only languages in anything remotely resembling widespread use to not have an utterly pathological concurrency story. Async-Await is an awkward crutch to shoehorn concurrency into languages that were never designed to support it, Go and its goroutines entirely misses the point of the exercise and simultaneously encourages you to write mutable state and punishes you for doing so, and the Actor model libraries for other languages are just half-baked, bug-ridden implementations of a small fraction of Erlang.
I think this says more than you think it does.
I've worked in software development professionally for 20 years. I have never worked at a company that had more than 1 backend language. A large percentage of developers work at small companies.
Erlang/Elixir/BEAM isn't for the Googles of the world. And that's fine. Tech needs to stop its infatuation with these companies. Just because something is right for those companies, doesn't mean it's right for yours. Ignoring this has done a lot of damage over the years. Production complexity is a killer.
People love to talk about scaling (and here, I'm talking about scaling a company), but only ever mention one aspect: scaling up. Things also have to scale down. Technological choices tend to sacrifice one for the other.
But BEAM languages have a wider scaling band than most other technological choices. And that is in large part because of everything it offers out of the box.
> "This begs the question: why aren't language and library designers stealing the structure behind Erlang's behaviours, rather than copying the ideas of lightweight processes and message passing?"
> [...]
> But for the world we live in today, that's a price not worth paying. We don't need the Erlang message bus to be the only message bus. The Erlang message bus is, frankly, not very good, and it's actively terrible if you want to use it for one non-Erlang process to communicate to another. We don't need the Erlang message bus. We have a dozen message busses, some in the cloud, some commercial, some open source, some that double as retention (Kafka), all of which scale better, none of which tie you to Erlang's funky data types.
I asked why they were not stealing the structure of behind Erlang's behaviours, I didn't suggest anyone should steal Erlang's message bus or anything else.
> And then, within the matrix of those message busses, we don't need Erlang's restart capability. We have an abundance of ways to restart processes, from systemd, to kubernetes, to any number of other ways.
I don't think restarting the process from systemd or kubernetes is comparable with a supervisor tree. First of all the tree gives you a way to structure and control the restarts, e.g. frequently failing processes should be further down the tree or they will cause their sister nodes to get restarted etc. The other obvious difference is speed.
> We don't need Erlang's behaviors. We have interfaces, traits, [...]
Yet I don't know of any other language which uses interfaces in a way which they achieve the benefits (listed in the article) that behaviours in Erlang (e.g. gen_server) give you, do you?
To some extent I know... but to some extent the answer is these things are all tied together. Erlang is a really tight ball of solutions to its own problems at times. I don't mean that in a bad way, but it all works together. It needs "behaviors" because it didn't have any of the other things I mentioned.
When I went to implement behaviors (https://www.jerf.org/iri/post/2930/ ), I discovered they just weren't worth copying into Go. You ask if any other language uses interfaces to achieve what Erlang does; my perspective is that I've seen people try to port "behaviors" into two or three other languages now, and they're always these foreign things that are very klunky, and solve problems better solved other ways.
"I don't think restarting the process from systemd or kubernetes is comparable with a supervisor tree."
It isn't, but the problem is...
"First of all the tree gives you a way to structure and control the restarts, e.g. frequently failing processes should be further down the tree or they will cause their sister nodes to get restarted etc."
I don't need that. I've been using supervisor trees for over a decade now, and they rarely, if ever, go down more than "application -> services". Maybe somebody out there has "trees" that go down six levels and have super complicated bespoke restart operations on each level and branch, but they must be the exception.
To the extent that I have deep trees, they're for composition, not because I need the complicated behaviors. A thing that used to be a single process service is now three processes, and to hide that, I make that thing a supervisor of its own so that the upper levels still just see an ".Add()" operation, instead of having to know about all the bits and pieces.
"The other obvious difference is speed."
Certainly, but those are only one of the options.
"Yet I don't know of any other language which uses interfaces in a way which they achieve the benefits (listed in the article) that behaviours in Erlang (e.g. gen_server) give you, do you?"
This is a case of what I'm talking about. Don't confuse Erlang's particular solution for being the only possible solution. Erlang's behaviors are basically the Template pattern (https://en.wikipedia.org/wiki/Template_method_pattern ) written into the language rather than implemented through objects. If you look for the exact Erlang behaviors out in the wild, you won't hardly find anything. If you look for things that solve the same problems, there's tons of them. A lambda function in AWS is a solution to that problem. The suture library I wrote is a different one. Java frameworks have their own solutions in all sorts of different ways.
To put it another way, whereas in 1998 people having the problems Erlang solved was rare, today we all have them. We can't be blundering around with no solutions since we are all too blinkered to use Erlang which just solves them all. That makes no sense. There are far more distributed systems concerned with reliability out there now implemented in not-Erlang than in Erlang. We are not all just blundering along in a fog of confusion, unaware of the existence of architecture, modularity, and abstraction. If programmers have a flaw, it's too much architecture rather than too little.
Maybe that's one of the problems with the Erlang writing. It's all implicitly written from a perspective of the 90s, where this is all a surprise to people, and it kind of seeps in if you let it. But that's not where the world is right now. It is not news that we need to be reliable. It is not news that we want to run on multiple systems. I've got non-technical managers asking me about this stuff at work whenever I propose a design. There's been a ton of work on all of these issues. It's not all good, by any means! But there's now too many solutions moreso than not enough.
None of these achieve the same goal. Or they result in significantly more complicated and brittle systems. Or they only achieve that goal insofar as you need to glue several heterogenous systems together.
> The reason why language designers aren't rushing to copy Erlang is that what was excellent and amazing in 2000 (and, again let me underline, I mean that very seriously, it was a cutting edge platform built with a lot of vision and moxie) is, in 2023, mediocre.
The main reason is that it is borderline impossible to retrofit Erlang model onto an existing language. Adding concurrency alone may be a decade-long project (see OCaml). Adding all of the guarantees that Erlang VM provides... well.
And on top of that too many people completely ignore anything in Erlang beyond "lightweight processes/actors".
The fact that you can have an isolated process that you can monitor and observe, and have a guaranteed notification that it failed/succeeded without affecting the rest of the system is a) completely ignored and b) nearly impossible to retrofit onto existing systems.
And there are exceedingly few new languages that even think about concurrency at all. And async/await is not even remotely state of the art (but people are busy grafting them onto all languages they can lay their hands on).
State of the art still is mutexes and killing your entire program if something fails. Often both of those.
That's incredibly optimistic. I am 30 and have basically no hope this will happen in my lifetime, and in the meanwhile, BEAM works great. I agree with all your points, except when you write 2023 as if we're doing things better now. Research in this area hasn't really bore many fruit over the last 30 years as you make it look like.
That it _also_ ships with other ways of doing things in no way constrains or limits your decisions, and most modern Erlang (or Elixir) applications I have maintained ran the same way.
You still get message passing (to internal processes), supervision (with shared-nothing and/or immutability mechanisms that are essential to useful supervision and fault isolation), the ability to restart within the host, but also from systemd or whatever else.
None of these mechanisms are mutually exclusive so long as you build your application from the modern world rather than grabbing a book from 10-15 years ago explaining how to do things 10-15 years ago.
And you don't _need_ any of what Erlang provides, the same way you don't _need_ containers (or k8s), the same way you don't _need_ OpenTelemetry, the same way you don't _need_ an absolutely powerful type system (as Go will demonstrate). But they are nice, and they are useful, and they can be a bad fit to some problems as well.
Live deploys are one example of this. Most people never actually used the feature. Those who need it found ways (and I wrote one that fits in somewhat nicely with modern kubernetes deployments in https://ferd.ca/my-favorite-erlang-container.html) but in no way has anyone been forced to do it. In fact, the most common pattern is people wanting to eventually use that mechanism and finding out they had not structured their app properly to do it and needing to give it a facelift. Because it was never necessary nor totalizing.
Erlang isn't the only solution anymore, that's true, and it's one of the things that makes its adoption less of an obvious thing in many corners of the industry. But none of the new solutions in the 2023 reality are also mutually exclusive to Erlang. They're all available to Erlang as well, and to Elixir.
And while the type system is underpowered (and there are ongoing area of research there -- I think at least 3-4 competing type systems are being developed and experimented with right now), that the syntax remains what it is, I still strongly believe that what people copied from Erlang were the easy bits that provide the less benefit.
There is still nothing to this day, whether in Rust or Go or Java or Python or whatever, that lets you decompose and structure a system for its components to have the type of isolation they have, a clarity of dependency in blast radius and faults, nor the ability to introspect things at runtime interactively in production that Erlang (and by extension, languages like Elixir or Gleam) provide.
I've used them, I worked in them, and it doesn't compare on that front. Regardless of if Erlang is worth deploying your software in production for, the approach it has becomes as illuminating as the stacks that try and push concepts such as lack of side-effects and purity and what they let you transform in how you think about problems and their solutions.
That part hasn't been copied, and it's still relevant to this day in structuring robust systems.
Genuinely curious why it's not very good. Were you speaking solely from the perspective of non-Erlang processes? And also specifically regarding remote messages rather than local?
It can implement the Erlang node protocol and show up as a full Erlang node. Neat capability, but the impedence mismatch between systems is something fierce, because the protocol deeply assumes that you're basically Erlang, e.g., not just that processes have mailboxes but that mailboxes have the exact same semantics as Erlang, and you have to implement process linking with the exact same semantics, etc. It's difficult.
Alternatively, you can write a proxy in Erlang where the first process speaks to an Erlang server to send a term to some other Erlang process that will then relay the message in whatever form. This will either be a custom protocol, in which case this is an awful lot of code to write for such a task, or a common messaging protocol in which case you don't need Erlang. That is, you can speak rabbitmq, but that doesn't need to be implemented in Erlang.
The production system I maintained on Erlang did this a lot, because I was forced to have a lot of Perl code interacting with the Erlang system. Back in the day it was a fine choice; it was a time when you couldn't just pop on to google and turn up a dozen battle-ready message busses in two minutes. Now, though, it is far better for the Erlang code to just be another node on your common bus than for it to be the message bus.
Lightweight processes using message passing is how Erlang stays up (and perhaps ironically, let-it-crash is how Erlang avoids going down).
Lightweight processes and message passing are the architectural decisions that address the business problem Erlang was designed to solve.
(And I say this as someone that's at least passably familiar with go and have worked on/contributed to golang projects in the past.)
I'm also coming from a ruby/js/ocaml/elixir/erlang background.
The thing I want to define with behaviours is observable effects on objects.
Such as interactions between tasks and objects between them. Not necessarily method calls but state interactions across multiple objects.
Some objects should be colocated on threads and send behaviours to arbitrary groups of objects. Think of it as a collection that responds to events.
I wrote a state machine serialization that looks like this. This defined a state progression between threads and async/await state machine.
I think organised state machines would mean actor programming is so much easier to program.
next_free_thread = 2
task(A) thread(1) assignment(A, 1) = running_on(A, 1) | paused(A, 1)
running_on(A, 1)
thread(1)
assignment(A, 1)
thread_free(next_free_thread) =
fork(A, B)
|
send_task_to_thread(B,
next_free_thread)
| running_on(B, 2)
paused(B, 1)
running_on(A, 1)
| { yield(B, returnvalue) | paused(B, 2) }
{ await(A, B, returnvalue) | paused(A, 1) }
| send_returnvalue(B, A, returnvalue)
I think this serialization is truly powerful and easy to understand.Erlang is mainly about OTP. OTP delivered an opionated take on distributed components -- think of Erlang on OTP as Ruby on Rails -- and did it exceptionally well.
But one could still do this with Java btw. The specs are still there and they are solid.
I don't know why the author is turning this into a competition of whether processes are more important or behaviors - they both are parts of a well designed system that work well together.
GenServers etc can't be written equivalently in Go/Java since goroutines and Java threads (even the new virtual threads) are not pre-emptive, whereas Erlang processes are truly independent.
What's the story there? Why did they decide to ban it?
EDIT: doh, this has been answered elsewhere in this thread.
This is an interesting software pattern -- especially applied to having it entirely supported entirely inside of one programming language... We see this software pattern in such things as Windows Services (they restart automatically if they fail), Unix/Linux daemons, as one of the original purposes of the original Unix 'init' process, and in high-availability, mission-critical 24/7 systems (databases, etc.).
Having all of that fault-tolerant infrastructure entirely inside of a programming language is indeed, somewhat novel.
But let's work up what a given language must have as prerequisites to support this...
First, we need the ability to run multiple processes inside of a language. Many languages can accomplish this with threads, but of the languages that do this, many need additional cumbersome programming to guarantee thread safety...
So the language needs to support the notion of a threaded process inside of that code -- without having to code extra to support this threaded process.
Next, the language needs some form of communication between supervisor (process A) and supervised (process B) code. Message passing is the solution -- but that requires each process to have its own message queue.
And message passing requires interfaces...
Here we're starting to sound like we're duplicating a mini Operating System inside a language(!) (should the language have pipes, too?) -- and/or that the complexity of most OS's would be deeply mitigated by designing them inside of a language that supported these constructs...
Whatever the case, I think it's a fascinating software pattern.
Yes, Go exists and supports features like this, Rust exists, and people are using it to write Operating Systems. And there are probably all other kinds of languages (both existing and existing in the future) that do/will support some form of these constructs...
But there's another interesting, purely academic, purely theoretical question here...
That is: What is the simplest full-featured OS (processes, IPC, synchronization primitives, memory management, hardware resource management, syscalls, scheduling, etc.) that could be written in the smallest amount of lines of code if the language it was written in had intrinsic knowledge of all of those underlying constructs?
Anyway, a fascinating article about Erlang, and it definitely gave the impression that Erlang was a whole lot more than I thought it was... I will be checking out Erlang for future projects!
In case you didn't know, Joe Armstrong actually calls Erlang/OTP an "Application Operating System(AOS)" in his thesis paper.
That is the entire point of that language's design.
Being able to do this at the programming language level is extremely powerful and creates an entirely different way of building applications. My go-to analogy is building a city (lots of individual, isolated, separate stack processes) instead of a big skyscraper (deep stack requests, concurrency difficult, error recovery manual).
You can build that city in a limited way with k8s, but there's way more overhead along the way, to the point of it not being enjoyable for me.
When some new container technology comes along, or new worker queue platform etc. Erlang/the beam will be exactly the same.
One of the advantages of working with Erlang I found is that it's just as easy to scale down as it is to scale up.
How does one scale down from Kafka once you have vendor lock in with them?