BEAM languages, Hindley–Milner type systems, and new technologies (opens in new tab)

(medium.com)

202 pointspyotrgalois10y ago86 comments

86 comments

26 comments · 5 top-level

pron10y ago· 15 in thread

BEAM is a very nice VM (albeit rather slow compared to HotSpot or V8), but I don't understand why every mention of BEAM has to spread misconceptions about the JVM:

> In many systems, Java included, the Garbage Collector (GC) must examine the entire heap in order to collect all the garbage. There are optimizations to this, like using Generations in a Generational GC, but those optimizations are still just optimizations for walking the entire heap. BEAM takes a different approach, leveraging the actor model on which it is based: If a process hasn’t been run, it doesn’t need to be collected. If a process has run, but ended before the next GC run, it doesn’t need to be collected

Well, how does BEAM know which process ran (so that its garbage should be collected)? Bookkeeping, of course, and that is also "just an optimization". Similarly, if a JVM object hasn't been touched since the last collection -- it doesn't need to be examined.

> If, in the end, the process does need to be collected, only that single process needs to be stopped while collection occurs

And new HotSpot GCs rarely stops threads at all for more than a few milliseconds (well, depending on the generation; it's complicated), collecting garbage concurrently with the running application, and other JVMs have GCs that never ever stop any thread for more than 20us (that's microseconds or so).

While BEAM's design helps it achieve good(ish) results while staying simple, the fact is that the effort that's gone into HotSpot gets it better results for even more general programs (collecting concurrent, shared data structures -- like ETS -- too).

I've said it before and I'll say it again: Erlang is a brilliant, top notch language, which deserves a top-notch VM, and the resources Erlang/BEAM currently have behind them are far too few for such a great language. Erlang's place is on the JVM. JVMs are used for many, many more soft-realtime (and hard-realtime) systems than BEAM, and yield much better performance.

An implementation of Erlang on the JVM (Erjang) done mostly by one person, was able to beat Erlang on BEAM in quite a few benchmarks, and that was without the new GCs, the new (or much improved) work-stealing scheduler and the new groundbreaking JIT (which works extremely well for dynamically-typed languages[1]).

OpenJDK could free Erlang programs from having to write performance-sensitive code in C (so many Erlang projects are actually mixed Erlang-C projects). While Erlang can be very proud of how much it's been able to achieve with so little, instead of fighting the JVM (or, rather, JVMs), it should embrace it. Everyone would benefit.

[1]: https://twitter.com/chrisgseaton/status/586527623163023362 , https://twitter.com/chrisgseaton/status/619885182104043520

chrisseaton10y ago

The thing I don't get about Erlang and BEAM is the idea that having lots of little processes means that your program will scale brilliantly to run in parallel.

Programming Erlang (authored by the creator of Erlang) says without any qualification at all that "Concurrent programs are made from small independent processes. Because of this, we can easily scale the system by increasing the number of processes and adding more CPUs."

When I read that I was expecting it to be followed by "ha ha... not really because of algorithmic sequential dependencies and Amdah's Law of course!" but it isn't!

You can have an infinite number of processes but if the dataflow graph they form doesn't have any parallelism then Erlang and BEAM aren't likely to be able to work any magic to make them so. Even if it did have parallelism it is only going to have so much and you certainly won't be able to arbitrarily scale it beyond that by increasing the number of processes.

What's more the typical advice about mutable shared state in Erlang is to encapsulate it safely in an process - which seems to be a recipe for further serialisation to me and so a crazy thing to promote!

corysama10y ago

Everything you are saying is technically correct. The issue is that Erlang is trying to solve a different problem than you are describing. It sounds like you are hoping to perform some large but single task and are disappointed that Erlang can't defeat the Amdahl limitations inherent in your task. That's not Erlang's goal.

Erlang's goal is to take problems that are embarrassingly parallel in theory and make them embarrassingly parallel in practice. Serving a billion independent http requests in a distributed, parallel manner can technically be done in Java or C or assembly. But, it's very hard to do well and very easy to screw up in painful, confusing, life-wasting ways. Erlang makes it much easier to do well and much harder to screw up.

1 more reply

vezzy-fnord10y ago

but it isn't!

Chapter 26 of Programming Erlang, 2nd Edition, Programming Multicore CPUs, quite explicitly notes the problem of avoiding sequential bottlenecks, and even devotes an entire exercise to parallelizing a sequential program.

RickHull10y ago

> The thing I don't get about Erlang and BEAM is the idea that having lots of little processes means that your program will scale brilliantly to run in parallel.

The point is scaling. Think in terms of request rate. If you know you can have millions of processes per machine and they run well in parallel, then you can handle requests with processes and stop worrying.

1 more reply

digitalzombie10y ago

> brilliantly to run in parallel.

I just want to clarify it's for concurrent not parallel.

Erlang doesn't promise parallel. You can get parallel from concurrent but not the other way around and once again Erlang only enable concurrency and you may get parallel beacuse of concurrency.

1 more reply

cdumler10y ago

No, Erlang does not mean that anything you write will scale. Your solution has to be broken into parallelizable pieces. But, the scalability of your solution is only as good as the mechanisms within the language and runtime to efficiently allow developers to create a scalable system. BEAM implements several nice features:

Data is immutable, so we don't have to worry about keeping data coherent between.. anything. Whether it is two processes or two nodes. New data is can be constructed with reference to old data without fear that the old data will be modified. So, "mutation" is really just new data with a reference to the old unchanged data. This greatly lowers the churn in creating new data. It also means everything can just pass (process to process or node to node) what it has without feature it will be out of date.

Everything is defined in modules. Modules define what we would think of in OOP as namespaces, structures, classes/types, and class functions. Importantly, they only define functionality. Modules do not have state. Therefore, functions accept some set of inputs, create new data from the inputs (no mutation), and return some output. This makes it very easy reason about what the code is doing if you keep the modules well defined and reasonably sized. This code can be shared around easily, too. It's got no state and is immutable.

Processes are an abstraction. You can think of them as thread, but they're really just a stack and and a little book keeping. A BEAM VM will normally of real threads equal to the number of CPUs in the machine. Each real thread will then exclusively pick a process, load the book keeping, point itself to the stack, and execute bytecode for a period of time. When done, it will mark the changes in the book keeping, and move to the next process. This is very lightweight, so literally millions can run on a single computer. Because they are self contained, they're easy to clean up. Processes also expose standard set of interfaces for communication, a pub/sub system. Again, immutable messages are sent back and forth. So, it doesn't matter if it's the same node or not.

Finally, everything is abstracted to the notion of nodes with in a cluster. By default, anything you executes on the local node, but you can specify otherwise. I can execute a module call on another machine or spawn a new process on another machine. It just means a little more information in the call, but it's the same exact concept programmatically. Also, it's possible group processes into named services. You can call a named service and it will know what processes to contact. It's a very low barrier to entry to parallelize your code if you just write it that way.

When you start thinking in terms of how structure you code for BEAM, you inherently get easy access to scalability.

1 more reply

anonyfox10y ago

The point of erlang isn't beating micro benchmarks. Everyone and his dog knows that other techs have better performance.

The ease of scaling across machines, fault tolerance and low latency variation are more typical selling points. Besides that, god prevent erlang to become just-another-JVM-language, I embrace competition.

pron10y ago

> The ease of scaling across machines

What does that have to do with the VM implementation?

> fault tolerance

True, that is a good selling point -- in theory. Indeed, BEAM's process isolation is better than the JVM's on paper. In practice, so many Erlang systems have so much C in them (because Erlang isn't fast enough for the data plane), that they can still bring down the entire VM (not as if there aren't other ways of doing that even without native code), or they interfere with one another in other ways because of BEAM's poor support for shared concurrent data structures.

> low latency variation

Nothing that can't be achieved on the JVM. Much of the low-latency Erlang enjoys is because relatively little data is kept on the Erlang heap anyway, and whatever significant amount of data is kept on the Erlang heap, it's in non-GCed ETS. If that's your way of achieving low latency variation, Erlang can do better on HotSpot.

> Besides that, god prevent erlang to become just-another-JVM-language, I embrace competition.

If your goal is not to have the best language environment you can but to show the world you have impressive results for the effort you've put in, then that's a whole other discussion.

And if all you want is competition, you can have Erlang on BEAM and the JVM. Why tie the language to one VM? Many JVM languages also compile to JavaScript, too (Clojure, Kotlin, Scala, Fantom and probably more)

2 more replies

jsprogrammer10y ago

Do you know of a chart that compares all of these dimensions across languages/VMs?

nickpsecurity10y ago

What you're saying is a VM with tons of R&D, tons of corporate investment, and a focus on speed was faster than a new one that was Ericsson's side project focused on stuff other than speed? Little surprise. Meanwhile, BEAM and its language have been doing exactly what they're designed for with enough success that it's mainstreamed naturally. Which Java didn't.

Truth be told, most of the crowd using BEAM doesn't care if it's a bit slower than Java. They just want easy scaling, distribution, and fault-tolerance. A different code-base than Java's is a plus in terms of increasing implementation diversity and avoiding the bullseye currently on Java.

pron10y ago

> avoiding the bullseye currently on Java.

That bullseye exists only in the minds of some HNers. Here is a very (very!) partial list of companies running primarily or largely on the JVM: Google, Twitter, Netflix, LinkedIn, Box, IBM, SAP, Amazon, eBay.

> They just want easy scaling, distribution, and fault-tolerance.

... So they write chunks of their code in C. That would be completely unnecessary if they'd just run Erlang on the JVM.

1 more reply

losvedir10y ago

As a point of disclosure, and also to improve your credibility, you are the author of Quasar (actors and erlang-style processes on the JVM), are you not?

I'm just learning Elixir (and therefore erlang/BEAM somewhat) and one thing that's cool to me is that a piece of code that's taking too long to execute can be paused by the VM while it switches to another thing, which keeps the latency down. I think, like, each process has some number of "ticks" or something before it switches away.

Can erlang on the JVM do that?

Edit: Also, the other thing that majorly attracts me to Elixir/Erlang is OTP (applications, genservers, supervision trees with restart strategies, etc). Are there any plans to port those libraries/philosophy into Quasar?

pron10y ago

Yes, I am Quasar's main author, but note that I'm not advocating Quasar. I'm advocating Erlang, only on the JVM. And yes, Quasar has all those features, too, but an Erlang implementation on the JVM would use the Erlang implementation, not Quasar's Java implementation.

> Can erlang on the JVM do that?

Of course it can! Just like BEAM does it. (In fact, Quasar used to do that, too. We took out that feature because Quasar also gives you access to kernel threads, and processes that take to long can just be moved to kernel threads, which does this kind of preemption better, anyway. But an Erlang implementation on the JVM can behave just as Erlang does on BEAM).

vitalyd10y ago

What JVM stops threads for ~20 micros?

BenoitP10y ago

http://www.azulsystems.com/products/zing/whatisit

8000$ per machine, though.

The G1 collector that will be made default in Java 9 might make some applications effectively pauseless too on some workloads.

2 more replies

johlo10y ago· 2 in thread

It's surprising that the BEAM support for operations and management is very rarely mentioned. To me this is the key selling point for using BEAM vs JVM or something else.

Being able to open a remote console and do system introspection/tracing/profiling/debugging is a huge advantage when running in production. And all languages running on top of BEAM ofc get this for free.

In my experience, running JVM in production with tools like JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when trying to understand what is happening in the system.

pron10y ago

> In my experience, running JVM in production with tools like JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when trying to understand what is happening in the system.

Then you haven't tried Java Flight Recorder/Mission Control or the new javosize. BEAM doesn't come close... :)

johlo10y ago

I hadn't heard of javosize before, looks interesting, thanks. Being able to update code and data on a live system is very useful and I haven't seen that for the JVM before (BEAM of course handles that :) ).

1 more reply

abrgr10y ago· 2 in thread

"It’s not going to be too much longer before we declaratively describe out systems as well as our code. I am looking forward to that."

Amen! Been doing that to the extent possible for a while and it is terrific!

nickpsecurity10y ago

It's been done before to varying degrees. The original in automatic programming was one that took input from case tools and autogenerated a lot of COBOL. Sun's DASL language from ACE project was a domain-specific language for specifying a type of web application. Around 9-10kloc of it autogenerated 100+kloc XML, client code, server code, etc. Lots of them in the 4GL category for database manipulation with WINDEV/WEBDEV more general-purpose yet still requiring coding in BASIC-like language.

So, it's not far-fetched. It will likely be a series of DSL's like the above or iMatix's model-driven development approach. These would specify it at a high level with precise requirements and constraints. Then, planning software with heuristics would produce the code. Similar systems for integration. Several people's worth of work or 10-20 tools becomes one person with one set of tools. Doubt we'll replace the person or need for some programming tools.

abrgr10y ago

I definitely agree that we'll always need people who think like programmers. We can develop tools in the vein of those you mention to significantly enhance the productivity of those people though. I haven't seen many tools like that that help in distributed systems or that allow one to easily visualize and understand an entire system.

1 more reply

rubyn00bie10y ago· 1 in thread

Nice article, and for an Elixir fan, provides a nice little snippet on something I've been having issues with but hadn't really put my finger on until I saw it:

"[...] and I really dislike that Elixir tries to hide immutability. That does make it slightly easier for beginners, but it’s a leaky abstraction. The immutability eventually bleeds through and then you have to think about it."

I don't think it necessarily tries to hide it (at all), but it does have some instances where something feels like a mutable structure. Those can be, at least for me, a bit confusing to reason about if you're expecting things both be and look immutable.

I suppose now that I know exactly what's weird, I should just go dig through the code and figure it out. Problem solved?

... One other thing, because I see this in the comments already, is that BEAM isn't the tool for every job-- but for some jobs, it is the only tool to do it well. Is the JVM faster at general tasks? Hell yes, but that's not the point, it's not even why BEAM is around.

It's about:

* Small concurrent workloads. Really long running CPU intensive tasks aren't going to be good.

* Low latency. Not just low, but with a very very small standard deviation. Your application's performance will be consistent.

* Fault tolerant.

The list goes on, and here's a nice summary of it (both bad and good):

http://blog.troutwine.us/2013/07/10/choose_erlang.html

There are times when I choose the JVM, there are times when I choose BEAM or MRI. I just try choose the right tool for the job, but some tools, make some jobs, very difficult.

cough ruby cough concurrency cough

Edit: One thing for people not familiar with BEAM, a "process" is not a Unix process, from the Elixir documentation:

"Processes in Elixir are extremely lightweight in terms of memory and CPU (unlike threads in many other programming languages). Because of this, it is not uncommon to have tens or even hundreds of thousands of processes running simultaneously."

bfrog10y ago

Then again, there's NIF libs with Threads for those tasks which are long running and require computational performance. Last I checked all the really fast math libraries were written in C/Fortran/C++ not Java

jdimov910y ago· 1 in thread

Here's a tool that you can play with to see how well Elixir scales with an embarrassingly parallel task (matrix multiplication) when throwing more CPU cores at it: https://github.com/a115/exmatrix

eggy10y ago

Yes, Elixir does well here, but I still prefer Lisp syntax. I would like to see a comparison of LFE and Joxa. Joxa seems more like Clojure. This presentation is a good one, but I'd like to see a nut and bolts comparison with side-by-side code:

http://www.slideshare.net/BrianTroutwine1/erlang-lfe-elixir-...

j / k navigate · click thread line to collapse

86 comments

26 comments · 5 top-level

pron10y ago· 15 in thread

BEAM is a very nice VM (albeit rather slow compared to HotSpot or V8), but I don't understand why every mention of BEAM has to spread misconceptions about the JVM:

> If, in the end, the process does need to be collected, only that single process needs to be stopped while collection occurs

[1]: https://twitter.com/chrisgseaton/status/586527623163023362 , https://twitter.com/chrisgseaton/status/619885182104043520

chrisseaton10y ago

The thing I don't get about Erlang and BEAM is the idea that having lots of little processes means that your program will scale brilliantly to run in parallel.

When I read that I was expecting it to be followed by "ha ha... not really because of algorithmic sequential dependencies and Amdah's Law of course!" but it isn't!

corysama10y ago

1 more reply

vezzy-fnord10y ago

but it isn't!

RickHull10y ago

> The thing I don't get about Erlang and BEAM is the idea that having lots of little processes means that your program will scale brilliantly to run in parallel.

1 more reply

digitalzombie10y ago

> brilliantly to run in parallel.

I just want to clarify it's for concurrent not parallel.

Erlang doesn't promise parallel. You can get parallel from concurrent but not the other way around and once again Erlang only enable concurrency and you may get parallel beacuse of concurrency.

1 more reply

cdumler10y ago

When you start thinking in terms of how structure you code for BEAM, you inherently get easy access to scalability.

1 more reply

anonyfox10y ago

The point of erlang isn't beating micro benchmarks. Everyone and his dog knows that other techs have better performance.

pron10y ago

> The ease of scaling across machines

What does that have to do with the VM implementation?

> fault tolerance

> low latency variation

> Besides that, god prevent erlang to become just-another-JVM-language, I embrace competition.

If your goal is not to have the best language environment you can but to show the world you have impressive results for the effort you've put in, then that's a whole other discussion.

2 more replies

jsprogrammer10y ago

Do you know of a chart that compares all of these dimensions across languages/VMs?

nickpsecurity10y ago

pron10y ago

> avoiding the bullseye currently on Java.

> They just want easy scaling, distribution, and fault-tolerance.

... So they write chunks of their code in C. That would be completely unnecessary if they'd just run Erlang on the JVM.

1 more reply

losvedir10y ago

As a point of disclosure, and also to improve your credibility, you are the author of Quasar (actors and erlang-style processes on the JVM), are you not?

Can erlang on the JVM do that?

pron10y ago

> Can erlang on the JVM do that?

vitalyd10y ago

What JVM stops threads for ~20 micros?

BenoitP10y ago

http://www.azulsystems.com/products/zing/whatisit

8000$ per machine, though.

The G1 collector that will be made default in Java 9 might make some applications effectively pauseless too on some workloads.

2 more replies

johlo10y ago· 2 in thread

It's surprising that the BEAM support for operations and management is very rarely mentioned. To me this is the key selling point for using BEAM vs JVM or something else.

In my experience, running JVM in production with tools like JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when trying to understand what is happening in the system.

pron10y ago

> In my experience, running JVM in production with tools like JProfiler/VisualVM/jconsole, etc. does not come close to the BEAM when trying to understand what is happening in the system.

Then you haven't tried Java Flight Recorder/Mission Control or the new javosize. BEAM doesn't come close... :)

johlo10y ago

1 more reply

abrgr10y ago· 2 in thread

"It’s not going to be too much longer before we declaratively describe out systems as well as our code. I am looking forward to that."

Amen! Been doing that to the extent possible for a while and it is terrific!

nickpsecurity10y ago

abrgr10y ago

1 more reply

rubyn00bie10y ago· 1 in thread

Nice article, and for an Elixir fan, provides a nice little snippet on something I've been having issues with but hadn't really put my finger on until I saw it:

I suppose now that I know exactly what's weird, I should just go dig through the code and figure it out. Problem solved?

It's about:

* Small concurrent workloads. Really long running CPU intensive tasks aren't going to be good.

* Low latency. Not just low, but with a very very small standard deviation. Your application's performance will be consistent.

* Fault tolerant.

The list goes on, and here's a nice summary of it (both bad and good):

http://blog.troutwine.us/2013/07/10/choose_erlang.html

There are times when I choose the JVM, there are times when I choose BEAM or MRI. I just try choose the right tool for the job, but some tools, make some jobs, very difficult.

cough ruby cough concurrency cough

Edit: One thing for people not familiar with BEAM, a "process" is not a Unix process, from the Elixir documentation:

bfrog10y ago

jdimov910y ago· 1 in thread

Here's a tool that you can play with to see how well Elixir scales with an embarrassingly parallel task (matrix multiplication) when throwing more CPU cores at it: https://github.com/a115/exmatrix

eggy10y ago

http://www.slideshare.net/BrianTroutwine1/erlang-lfe-elixir-...

j / k navigate · click thread line to collapse