I work on a Storm library for Python that's called streamparse[2], and the goal of the project is to allow us to easily achieve Erlang OTP-style reliable processing atop open source infrastructure while still writing pure Python code.
I also gave a PyCon talk about streamparse which you can find on YouTube[3]. It describes the motivation for the project -- which is to solve a large-scale real-time analytics problem with Python and do so in a reliable way, while beating Python's multi-core limitations (the GIL, etc.)
[1]: https://storm.apache.org/
that reliability wasn't the case for at least one of the internet heavyweights, quite the opposite [1]
[1] http://blog.acolyer.org/2015/06/15/twitter-heron-stream-proc...
"OTP contains functions to safely spawn and initialize processes, send messages to them in a fault-tolerant manner and many other things."
Also, "supervisors are one of the most useful parts of OTP you'll get to use." And, "an OTP application specifically uses OTP behaviours for its processes, and then wraps them in a very specific structure that tells the VM how to set everything up and then tear it down."
Given that, I don't see how you could think Storm isn't at all related. (I never said it was equivalent, just very similar goals.)
Also, my team is working on an open source example project called birding that uses streamparse and pykafka together to build filtered firehose tweet streams from Twitter's public API. We think this will illustrate both Storm and Kafka really well, and be beyond the typical word count examples. See https://github.com/Parsely/birding to track that effort.
So far Storm is working as advertised. Frankly, what it does isn't that terribly difficult, but it's good to have a well-tested implementation of it so we can focus on our business logic.
Scala/Java: Akka http://akka.io
Haskell: Cloud Haskell http://haskell-distributed.github.io/
and none of the above have solved preserving types across machine boundaries (or at least not in a stable/non-experimental let's-bet-the-bank-on-this release).
Quasar looks pretty insane at first blush; if Java 9 or 10 delivers on the missing piece [1] for this library, watch out.
[1] "Quasar fibers are implemented by creating and scheduling continuation tasks and since the JVM doesn’t (yet) support native continuations, Quasar implements them through selective bytecode instrumentation: methods that can block a fiber currently need to be explicitly marked through annotations so that Quasar can insert the continuation suspension and resumption hooks."
Would you say that it's missing something?
What about Static Pointers[0]? Do you feel the solve the problem or are likely to if you characterize them as non-stable and/or experimental?
0: https://ocharles.org.uk/blog/guest-posts/2014-12-23-static-p...
You could, I suppose, hack something together using multiple JVMs per machine (avoid global GC) and Akka and implement some custom live diagnostics, but if you're really doing a telecommunications project there's no real shortcut. Plain Akka will cut it in many cases but it's not a thoroughbred like Erlang. Give up, Erlang just can't be beaten at its home turf by a framework.
There is nothing Erlang VMs provide that the JVM doesn't, except for some guarantees that are simply different on the JVM, and I think most applications would prefer the JVM's advantages anyway.
To get erlang/OTP you would have to replicate it from the beginning in C, and there's no point in that, when you can just use erlang.
Can you find the magic sauce in another language?
I believe the honest answer is no. You can't do it in Java because the JVM doesn't support it. You can't really build this on top of something that was not designed with parallelism and concurrency in mind, and designed correctly. For that reason Go doesn't do it either. Scripting languages aren't even thinking about it and to do it in something like C or C++ you'd basically be recreating a poorly implemented half working version of erlang.
So more interesting is--- why do people have this aversion to Erlang and Elixir?
It doesn't take but a couple weeks to get used to erlang programming, and Elixir is even faster I think.
For the record I've been taking interns thru Programming Elixir in about 2 weeks and getting them to where they can program in the language (just at that point now with a new group so we'll see how much they struggle... but I've already seen code written by them.)
Erlang did it right and has put in several decades of effort getting it even more right. Other languages, like Go, aren't even trying (though of course advertising "concurrency" is very popular, none of them really do it.)
All these people choosing things like node.js or go for problems where erlang is the correct solution make me think that too many of the people in our industry aren't really engineers, but more scripters following the herd.
And engineer knows why erlang is the correct solution for distributed systems.
Because you may want good performance without dropping down to C. Many serious Erlang applications are mixed Erlang/C applications, where they delegate processing to C, or use Erlang for the control plain and C for the data plain.
Or because you want more sophisticated, fast transactional shared-state than ETS allows.
Or because you want to take advantage of a much wider selection of high-quality libraries.
Note that these aren't problems with Erlang or Elixir as languages -- they're both great -- but rather with the runtime they're running on, whose developers simply don't have the resources to make it all that it can be. Which brings me to:
> the JVM doesn't support it.
The JVM most certainly does support it. It was designed with parallelism and concurrency in mind, and includes some of the best high-quality, production-ready implementations of concurrent data structures and schedulers in existence. The effort that's been poured into the JVM -- and even JVM concurrency alone -- dwarfs the effort put into Erlang at least by an order of magnitude.
I think Erlang would benefit greatly if it were to target the JVM. There is a project, Erjang, that runs BEAM code on the JVM, but it is not very actively maintained and doesn't take advantage of some of the recent, relevant innovations in the Java world. Since that project was developed, there have been major improvements in the pertinent areas on the JVM: concurrent GCs, JITting of dynamically-typed languages, fibers and work-stealing scheduling.
OpenJDK is the second-largest open-source project in the world (after the Linux kernel), and I don't see why Erlang shouldn't take advantage of the vast resources it has at its disposal. OpenJDK and Erlang are a great fit!
Can you state which language constructs Erlang has that other languages don't, that are critical to concurrency? Cause it looks like someone built a great VM and cluster library, then tossed in a language for fun.
And please explain why Erlang's features are not possible in other languages. Not just unavailable at the moment. Or do you mean the guarantees are different? As in other languages will let you have corrupted shared state? (I view that as a positive; let me escape guarantees when I want too.)
Or perhaps point me to a decent intro that'd cover all this? I looked at it before and didn't get the real point why we need Erlang the language or why in theory, you can't have OTP with the CLR or JVM.
Erlang has built-in immutability done right (more so than in Scala, OCaml, Clojure(?)) - if it's not built in the language from the start, there's nothing you can do about it.
Erlang has built-in binary serialization/deserialization that makes messages across network easy and natural. Please show me a language that can serialize a closure and send it over network. A rich type system and general-purpose language features are obstacles here, types of values have to be dumb to serialize seamlessly.
Other than that, it has a lot of convenient and coherent features like good pattern-matching (useful for parsing and routing messages), proper tail-call optimization (just look at Scala, again), the built-in actor model, lightweight processes, process monitoring, the large library of battle-tested distributed practices, OTP, – these things can be recreated in another language/VM, but it will probably take huge resources and efforts and self-limitation of the platform, not features.
And two different lisps:
* Lisp Flavored Erlang (http://lfe.io/) * Joxa (http://joxa.org/)
However it is very useful to have access to actor style concurrency outside of the BEAM ecosystem.
One of the really nice things about Celluloid is you can start up the actor system inside an existing non-threaded and non-actor aware system and write concurrent code.
Look the erlang syntax is not that hard.
Definitely worth taking a look at it.
But recently I've been trying to understand OTP/Erl and I'v had no luck.
Now seeing these two as an alternative makes a lot of sense for diff langs.
With `jRuby` and more and more with `Rubinius`, the concept of actor-based concurrency itself is coming into range of being language-unspecific, with Erlang not being the default authority on that.
Akka gets a lot of respect, and of course Elixr, Go, Rust, etc get a lot of concurrency attention... But Celluloid with Ruby is far superior in my opinion, because it is not only performant and functional, it actually allows you to get into the internals and change how your actor system behaves. It's as simple or as complex as you want it to be. Oh and you don't feel like killing yourself while you're writing code.
Use the right tool for the job. Not the one you like best. Just because you can use the blunt end of a drill to hit a nail, doesn't mean you should give up hammers... You might even break your drill!
If your problem's solution requires extremely concurrent, high availability, low latency, throughput then you must use BeamVM. The JVM will be too slow and largely memory inefficient compared to Beam.
If concurrency isn't needed, or isnt applicable (parallel, or synchronous, problems), then it's not the answer nor is OTP. E.g. I wouldn't dream of writing and image processor in Elixir-- Beam isn't the right tool to solve that problem.
Also, checkout dialzyer if you think you need type safety. It'll catch plenty of stupid mistakes... I had that urge (for types) coming from the JVM, or a non-functional non-dynamic language, but it's actually a straw man. I'd highly encourage you to get past classes, and state, so you can really enjoy the platform for what it offers-- it's pretty eye opening.
If I take a high resolution image and spawn N*M BEAM processes which each deal with performing processing on some subcomponent of the image, and then use some scheduling algorithm to reassemble the image after the processing is done, wouldn't you have substantial speedups compared to using some other image processing library in a language like Python?
In the same vein, things like matrix multiplications or other numerical algorithms that are known to parallelize well seem to lend themselves really well to BEAM processes, although the libraries aren't written yet since you can trivially call Python from BEAM anyways.
I'm not sure what makes real difference to Erlang's processes here, I hope someone can explain this with more details.
Java concurrency primitives like ExecutorService and BlockingQueue with support from libraries like Google Guava (ListenableFuture & ListeningExecutorService and, more loosely related, EventBus), as well as libraries like Netty form an excellent basis for concurrent software development in Java. I've developed a number of event processing type applications in Java (including our email delivery system) using message passing styles and future/promise styles and feel like Java does a good job accommodating them, helping them perform well, and allowing them to be easy to operate and troubleshoot. You can make this work all the way down to asynchronous IO at the operating system level if you wish, although in practice we get acceptable performance allowing that lowest level to use blocking IO for simplicity.
It's really easy to set up a "pipeline" of message processors in a Java application that produce and consume from BlockingQueues; or alternatively, submit work to an ExecutorService or publish an event to an EventBus. Maybe not as easy as other languages or frameworks, since there is some scaffolding, but it feels pretty minimal to me. (Again I say this as someone who has not worked in Erlang or other concurrent languages / frameworks extensively.) Unlike some other concurrent code, pipeline-type code is pretty easy to write, understand, and debug. The hardest part is usually orchestrating safe controlled shutdown.
One area where I understand that Java probably does not compete with Erlang is in the reliability of individual processes and threads and whatnot within a machine. There is no simple way in Java, for example, to continue processing requests in the application while you shutdown and restart with a new version. However, we typically accommodate problems like this at the next level up by routing traffic away from a machine in preparation for it to receive a software updated.
Java's exception handling is also robust. There is not much need to worry about "termination" of individual processes. A top level try/catch block in your message processing system goes a really long way to making the app itself immune to any kind of lower level failure. It might be difficult to convey the supreme confidence of error handling in Java if you've only worked with other languages, but a subjective feeling is: "There is absolutely nothing that can go wrong in any Java code that I might execute that will not unwind in a controlled way through my try/catch blocks and give me a nice, clear stack trace and error message." Whatever thread was handling that work can move onto the next message nicely and cleanly. And beyond this, "There is absolutely no way that any Java code I execute can interact with any part of the application except the objects being passed to it."
People credit garbage collection in Java with making applications and libraries much easier to compose. I personally believe that Java's error handling and behavior containment is also a big part of it. A library that I call simply cannot crash me or interact with anything except what it's passed. (OK, there are some exceptions to these rules, such as code that calls into native code, or running out of heap space, or weird dynamic/reflective stuff, but (i) you can avoid most of them in practice (ii) they don't come up much anyway (iii) exceptions to the rules don't tend to be a problem.)
Application crashes like OOM are also problems that we solve in other layers: if they happen, it means the software is not correct or not tuned correctly, and the crash of an application at this level is handled by the routing layer on top. We don't consider application crashes as something that we need to worry about on an ongoing basis; it's more of a QA issue during development. We do not tend to have systems though where "this particular machine must be available at all times".
None of this specifically supports distributed application development along with concurrency, beyond making message passing easy within the app. That's where client libraries like Netty or frameworks like Akka go much further.
It's not trivial, but Java certainly supports hot code swapping. How hard it is to use it depends on how much you want it to do, like preserve state etc. For stateless services its not hard to load the new version using a new class loader to live side-by-side with the old one, and then kill the old one (automatically) once all its requests have completed, and let the old code be collected by the GC.
It's a neat idea though. Do you know of any frameworks that would make the idea easier to implement? It sounds loosely similar to the Unix availability concept of passing off the listening socket to a new instance of the process, while allowing the old processes to linger until they finish serving their requests.
All other things equal, I favor architectural approaches where I can take any machine offline safely and effortlessly since I tend to need to handle that case anyway for availability reasons.
Orbit for the JVM / Scala by EA (BioWare actually)
https://github.com/electronicarts/orbit
Then there's Orleans by Microsoft for .NET:
https://github.com/dotnet/orleans
They feature a concept called Virtual Actors as opposed to other frameworks like Akka for the JVM.
So, Erlang/OTP are correct. The rest, so far, are not correct.
What I'm learning here is that people don't understand what erlang/OTP is.
- Erlang, the language and system, was designed for highly concurrent and fault tolerant systems. In many ways these were more important than raw throughput in that if it could not handle the massive concurrency and build truly fault-tolerant systems then it wasn't interesting. At all.
- Same with pre-emptive scheduling, the system had to be non-blocking. Again if it wasn't then it wasn't interesting. Yes, you can provide primitives to allow the programmers to do it but this does make things difficult and not-reliable.
- The language design was focussed on these types of systems and and implementing the types of architectures which OTP supports.
- From Erlang's point of view using OS threads as a base for Erlang processes is not an option. They are way too heavy and you can't have enough of them to be interesting. 10k processes are child's play, 100k processes is starting to get interesting and 1M process production systems exist, for example WhatsApp.
- For Erlang processes are the basic building blocks in a similar way to objects in an OO language. What would a Java programmer say if they were told that they couldn't have more than 1000 objects?
- Having the erlang VM handle using all the cores, or as many as you want, by default is just the natural way to do things. If I had to do any form of restructuring of my system because I was running on 4, 8, 16 or 32 cores I would consider that to be intolerable and so primitive I would wonder what the implementors were thinking of.
- While the Erlang syntax is different (which functional language doesn't have a different syntax?) it is actually very concise and consistent, this by design. The elixir syntax is "Ruby influenced" and more complex and feature filled. Which you prefer is up to you.
- I am very fond of lisp so there is at least one native implementation of lisp on the Erlang VM, LFE (Lisp Flavoured Erlang), http://lfe.io/ and https://github.com/rvirding/lfe.
- There is nothing which you can do in one which you can't do in the others, after all they run on the same VM and you can easily combine them and use them together.
That's about all for the moment,
Robert
* Project Iris http://iris.karalabe.com/ * Go Circuit https://github.com/gocircuit/circuit * NSQ http://nsq.io/ * Consul https://www.consul.io/ * SkyNet https://github.com/skynetservices/skynet * Grace https://github.com/facebookgo/grace
Each having their pros/cons with varying quality.
Replicating Erlang processes -> usually you use goroutines. When that level fails you fail the whole process and restart it using some supervisor. You may need computer level failover as well, depending on the requirements.
For the servers/protocols part of OTP, the stdlib is usually sufficient to get things running; although may need some additions for rarer encodings.
For machine level deployment there are cluster managers i.e. kubernetes, aws, google cloud etc.
Regarding debugging/monitoring I haven't seen anything that is close to Erlang.
Basically, I have no idea what you are doing and cannot recommend anything particular. For the full feature set there is no single replacement. If you need OTP then use OTP.
[0] https://launchpad.net/candygram - main page is on SourceForge which is down at the moment, and should be avoided even when it is up.