Fibers implemented with bytecode instrumentation also have some (small) added overhead (which is why we'd like them to be built directly into the JVM), but this makes little difference in practice: HotSpot's compiler is so good that with any added real work, that overhead is becomes negligent, and the compilation quality means that overall performance exceeds anything that can be achieved by Erlang VMs.
Also, the report says: "the JVM has a single heap and sharing state between concurrent parties is done via locks that guarantee mutual exclusion to a specific memory area. This means that the burden of guaranteeing correct access to shared variables lies on the programmer, who must guard the critical sections by locks". This is grossly inaccurate. While it is true that the JVM has a shared heap, this means that it can allow programs to share mutable state among threads -- not that it necessarily does so. The JVM leaves the concurrency model up to the language implemented on top of it (just as the hardware and OS support a shared heap, but various languages may choose not to expose that as a visible abstraction to program code). E.g. Clojure only allows shared mutable state if it enforces transactional modifications. Erlang also allows this kind of shared state via ETS; the difference is that ETS must be programmed in C, whereas on the JVM you can write the shared data structure in a JVM language. This also means that on the JVM, objects stored in such a concurrent data structures are handled by the GC, whereas in Erlang (IIRC) ETS cause some issues with GC (EDIT: in fact, ETS data is not garbage collected at all).
I believe that the JVM is a strict superset of any Erlang VM. In particular, HotSpot (the OpenJDK's JVM) is so well implemented, that the main difference -- even when running programs that behave just like Erlang program -- is a huge boost in performance, and never needing to use C to achieve either good performance or some behavior that is unsupported by Erlang semantics.
You have a lot of data.
You rarely change that data.
You still have to walk over it when you garbage collect.
ETS allows you to store Erlang terms into a tuple space outside the heap of any process. This means you avoid garbage collecting, and you can have 120 gigabyte of RSS, but only have to collect 400 megabytes of those. If you look at how many "mmap" implementations there are for GC'ed languages, and if you have ever reached for one, you know what I'm talking about.
Erlang wins by a large margin because the JVM has to garbage collect and block everyone while doing so. This is not desirable in a soft-realtime system.
I'd definitely not order one as the strict superset of the other.
Not even close (certainly not when using Quasar). Whether or not you have large GC pauses depends on how you use the heap. If you only allocate objects that live for the duration of the request (which can be enforced by your choice of JVM language) you get the same GC behavior as Erlang (only more general), and a much, much, much, better computation performance, due to HotSpot having one of the world's most advanced optimizing compilers.
However it's worth nothing the test only runs for 15 seconds. It might be an interesting addition to run it for 10 minutes and measure 99.9% latency, as jlouis proposes, and prove one of you correct.
https://www.techempower.com/benchmarks/#section=data-r10&hw=...
Yet people seem to think OpenJDK is the only one.
Furthermore, Erlang has some things against it in the speed department:
It is forced functional.
It is dynamically typed (A tracing JIT can somewhat alleviate this problem).
But I think the BEAM architecture is pretty sound. It has a lot of things in common with a micro kernel, architecturally. And while micro kernels are not popular currently, they see a lot of use in the embedded space, on sattelite's and so on.
Isolated garbage collection. Limiting GC to a single process/thread. It's my understanding that this is only possible with individual heaps, prohibiting shared memory. I see this as a polarizing tradeoff (with pros and cons). I don't think the JVM does this, and therefore isn't a superset.
Preemptive scheduling and soft real time. It's my understanding that Quasar bridges this gap by looking for natural points in execution to insert a pause/continue (oversimplified I'm sure). What are the guarantees? Erlang provides fairly strong guarantees about fair scheduling, making it very suitable for soft real-time systems. Would Quasar, for instance, pause a tight for loop if execution was running long? Or does it only pause on blocking operations like IO?
Yes, but the only advantage is a simpler GC algorithm. HotSpot's GCs are so advanced (let alone HotSpot descendents like Zing) that they give you similar behavior even without isolation, and they include collection of shared data structures, too.
> Would Quasar, for instance, pause a tight for loop if execution was running long? Or does it only pause on blocking operations like IO?
Quasar does full preemptive scheduling but not time-sliced based preemption. This is not because it can't -- as a matter of fact, early versions of Quasar did time-sliced based preemption, but it turned out to provide no benefit whatsoever. In fact, Erlang's behavior is a limitation. The reason is that Erlang has only one type of thread -- the process, or the user-mode thread -- and therefore has to handle any type of thread, including those that are computation heavy. Thing is, work-stealing is a great scheduling mechanism for transaction-serving threads (that block very often), but not so good for computational threads. Quasar lets you choose: fibers for transactions stuff, and plain threads for long-running computations, both are abstracted into what we call a "strand".
I have a chip on my shoulder about limited concurrency abstractions. I've been burned too many times by Akka and thread pools and Scala Parallel collections etc.. I've always appreciated the fact that while I won't get the most performance out of BEAM, there's very little I can do wrong that will break my application.
I'm going to have to spend some time with Quasar. If your claims hold up, then Clojure plus Quasar seems like fantastic platform.
I don't get how JVM is a strict superset of Erlang VM when it doesn't have pre-emption.
If BEAM set have preemptive scheduler as an element and JVM, IIRC, does not have such feature, then JVM is not a super set let alone a strict super set.
There isn't any at all for ETS:
Note that there is no automatic garbage collection for
tables. Even if there are no references to a table from
any process, it will not automatically be destroyed
unless the owner process terminates. It can be destroyed
explicitly by using delete/1. The default owner is the
process that created the table. Table ownership can be
transferred at process termination by using the heir
option or explicitly by calling give_away/3.
Of course, you'll seldom be using ETS directly, but rather through Mnesia.Mostly the comment about "adding complexity", comes from my own biased experience of working with bytecode instrumentation. I work on the JRebel team @ZeroTurnaround and as you can imagine, we have to do quite a bit of instrumentation to get this nice reloading behaviour. Though as Murphy's law states, if something can go wrong it will and if something goes wrong after instrumentation then debugging it will not be a pleasant task. Which of course doesn't mean that it can't work nicely eventually, proven by our large number of happy customers.
Quasar's own documentation states that "If you forget to mark a method as suspendable ... you will encounter some strange errors. These will usually take the form of non-sensical ClassCastExceptions, NullPointerExceptions, or SuspendExecution being thrown". I think forgetting something is a very human thing to do and when you've got a large codebase then debugging such exceptions is what I meant by "adding complexity". But again this was only a speculation.
Regarding the other comment about shared mutable state, then of course it isn't necessarily required and I did write in the summary that "Java and the JVM provide enough tools to retrofit any concurrency model out there, but retrofitting anything won’t be the same as taking it into the initial design", which I do think still holds and I believe is one of the reasons why using libraries like Quasar won't be a trouble free experience, at least not until it's somewhat built into the JVM.
Though I won't even try to claim to know the ultimate truth and I'm always grateful to have someone correct me when I'm wrong. I appreciate your efforts in trying to make the JVM a better/more versatile platform with Quasar. I think that any such bold undertaking will only benefit the ecosystem in the long run.
> Java and the JVM provide enough tools to retrofit any concurrency model out there, but retrofitting anything won’t be the same as taking it into the initial design
The concurrency model requires no retrofitting. It is simply a strict superset of Erlang's. The computer and OS also support a full shared-memory concurrency model, yet it can be restricted -- not retrofitted -- to run languages like Erlang or Rust, with a more restricted model. Same goes for the JVM. It has a general-purpose shared heap, but any language may restrict its use. No retrofitting is required. Quasar doesn't impose any further restrictions (that's not its job -- simply to provide fibers), but a language like Clojure certainly does. Clojure is no less safer than Erlang. The implication is that an Erlang running on the JVM requires no C code to implement something like ETS, but the underlying JVM semantics are no more foreign to Erlang than the underlying machine semantics, and vice-versa: Erlang is no more foreign to the JVM than to the hardware. Erlang simply places restrictions on their use, and they both provide lower-level abstractions.
One major thing that turned me off of the language for our particular project was that there was no way to have a large read-only data structure in RAM and sic a bunch of parallel threads to analyze or search it. It is hackable if you mash your data structure into a large byte binary, since large bins live on a shared heap, but it's a pain to deal with non-native representation for your data.
It's very obvious that Erlang is optimized for a particular usage model, and even when you're dealing with a number of functional parallel processes, it might still clash with what you're doing.
Same thing as with ETS or SQL.
Implementing traversal-heavy algorithms across a large in-RAM data network does not require external tools, and is not helped either in performance or simplicity by decoupling direct pointer-based linkages into hash keys, and speaking to external interfaces. At least not for the work we were doing.
It also reminded me how much I wish there was an actual specification for BEAM, finding out the details of the how the bytecodes work is an arduous task compared to the JVM where everything is explicitly stated. IMHO both VMs are excellent in their own right, the HotSpot JIT is incredible, but I still can't deny that I find the beam process model on concurrency more elegant than the JVM one, though in practice I've only toyed in erlang so I have no real-world grounds of comparison there. Does anyone by chance?
The paper could also use an evaluation section - e.g. implementing a solution to a well known problem like the Dining Philosophers in both languages and comparing both the code and runtime characteristics.
Otherwise, a decent high-level overview.