Edit: I think I found it [1] That iteration was from 2002. I'd be curious to see if his opinion has evolved in 12 years.
Also, interesting to see game developer Chris Hecker [2] in that thread.
[1] http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/threa... search for "Why systhreads?" and "Xavier Leroy". Also, damn their website's broken.
Better link, I wish GMane had better Googlejuice: http://thread.gmane.org/gmane.comp.lang.caml.general/16381/f...
> To make things worse, non-blocking I/O is done completely differently
> under Unix and under Win32. I'm not even sure Win32 provides enough
> support for async I/O to write a real user-level scheduler.
sigh, VMS got the link between processes, threads, I/O and waitable events (specifically, the link between tying the completion of future I/O to subsequent computation) right from day one. And by virtue of Cutler, therefore, so did NT, and thus, Windows.UNIX did not. The core concept of separating the work (computation to be done after an event occurs) from the worker[1] (the thread that performs the work) is absent; the manifestation of that is the lack of good, completion-oriented asynchronous I/O primitives. Instead of being able to say to the kernel "here, do this, then let me know when you're done"[2] and moving on to the next piece of work in the queue, you have to do the elaborate non-blocking multiplex dance for socket I/O, palm file I/O off onto a separate set of threads that can block (or do AIO) and generally manage all threading and concurrency primitives yourself.
It took me ten years of UNIX systems programming to suddenly grasp the elegance of the VMS/NT/Windows approach a few years ago. It provides you with everything you need to optimally exploit all your cores for work that is both heavily compute bound and I/O bound.
It has been fascinating to see the difference in performance between Linux and Windows in practice with PyParallel when Windows kernel primitives are exploited properly:
https://speakerdeck.com/trent/pyparallel-pycon-2015-language....
And more recently, with 10Gbe hardware at home:
Linux lwan (the top performer on Techempower Framework Benchmark):
[trent@zebra/ttypts/1(~s/wrk)%] time ./wrk --timeout 120 --latency -c 256 -t 12 -d 30 http://10.0.0.2:8080/plaintext
Running 30s test @ http://10.0.0.2:8080/plaintext
12 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.34ms 7.46ms 197.13ms 82.40%
Req/Sec 14.41k 364.49 18.82k 76.61%
Latency Distribution
50% 398.00us
75% 9.01ms
90% 17.50ms
99% 28.03ms
5178617 requests in 30.10s, 0.93GB read
Requests/sec: 172048.49
Transfer/sec: 31.67MB
Windows PyParallel: [trent@zebra/ttypts/1(~s/wrk)%] time ./wrk --timeout 120 --latency -c 256 -t 12 -d 30 http://10.0.0.2:8080/plaintext
Running 30s test @ http://10.0.0.2:8080/plaintext
12 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.52ms 9.38ms 492.43ms 99.33%
Req/Sec 18.37k 1.01k 22.75k 73.50%
Latency Distribution
50% 1.09ms
75% 1.28ms
90% 1.56ms
99% 5.18ms
6598900 requests in 30.10s, 1.03GB read
Requests/sec: 219236.69
Transfer/sec: 34.92MB
./wrk --timeout 120 --latency -c 256 -t 12 -d 30 106.30s user 138.87s system 814% cpu 30.114 total
[1]: https://speakerdeck.com/trent/parallelism-and-concurrency-wi...[2]: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-...
It'd be interesting if running under the VMS/NT thread/fork model could be seen as a reason to deploy some apps on ReactOS rather than Linux/BSD. Would also be interesting if one could see any difference running a multi-core KVM guest on ReactOS vs a Linux/BSD guest/container/jail. Although I suppose one would need to dedicate a hw nic to see any real results (avoiding the host OS/VM scheduler etc)?
Note-to-self: something to play with...
This doesn't mean that Windows' philosophy does not give you optimal performance in PyParallel. It simply means that OCaml had chosen for its low-level system primitives a Unix model and that it was difficult to make a Windows version of the same primitives so that OCaml programmers could write this kind of program portably between Windows and Unix.
NOTE: without, at the time it is in my timezone, looking up the full post, I have to say that I don't think that the quoted two sentences have anything to do with the discussion. It seems to me that the two sentences assume that a multicore (multiprocessor, at the time the post was written) OCaml runtime is not available, and discusses the options to still provide threads. A user-level scheduler is one option to provide threads to OCaml programs without a concurrent OCaml runtime. Another option is to use Windows' native threads and superior philosophy for blocking primitives to run each OCaml thread as a native thread (although at most one of these will be running at any given time. All the others will be waiting on the heap mutex).
OCaml ended up providing threads under Windows and a Unix-like “Unix” module around 1996-ish, way before the linked discussion. So thanks for the explanation about VMS, but I think it is off-topic, too.
NOTE 2: I have now read the original post. You should, too. It starts with:
> Threads have at least three different purposes:
>
> 1- Parallelism on shared-memory multiprocessors.
> 2- Overlapping I/O and computation (while a thread is blocked on a network
> read, other threads may proceed).
>3- Supporting the "coroutine" programming style
> (e.g. if a program has a GUI but performs long computations,
> using threads is a nicer way to structure the program than
> trying to wrap the long computation around the GUI event loop).
>
> The goals of OCaml threads are (2) and (3) but not (1) (for reasons
> that I'll get into later)
What makes it relevant to the current discussion is (1), but Xavier is discussing (2) and (3) at the time of the quote you chose to take out of context.
> What about hyperthreading? Well, I believe it's the last convulsive movement of SMP's corpse :-)
Oh how things have changed. This was written before it was clear just how much of a disaster the P4 was, so it was a pretty reasonable position at the time.
"In summary: there is no SMP support in OCaml, and it is very very unlikely that there will ever be. If you're into parallelism, better investigate message-passing interfaces."
Now that Rust is around and supporting multicore, that's probably where I'll be investing my time.
I'd love to hear feedback from people who have used both Rust and OCaml.
In short, OCaml is a mature language that has been used for decades in commercial applications. I feel OCaml is the next progression for the people that got excited about distributed systems via the Erlang path and want more of the safety and reasoning that comes from a strongly/statically-typed language like OCaml. Rust may or may not take off, but I am confident OCaml will remain viable for the foreseeable future, and probably gain slow, but steady popularity as engineers see all the cool things you can do like MirageOS: http://openmirage.org/
Last I checked, there was decent third-party library support in Batteries. I imagine it would be painful if you were to use Batteries' "UTF8.t" string type and had to interface with some other library that used "string" or some other string solution (like Camomile)?
There are some areas where OCaml is more advanced than F# (functors, the codegen from the optimized compiler, lack of the msbuild barf sandwich, less hacky on non-Windows platforms), but there are also plenty of areas where F# is more advanced than OCaml (computation expressions, code interoperability, real 32 and 64 bit integers, agents, multicore runtime).
I would say that if OCaml was ideal for you except for the lack of parallelism, then you should definitely check out F# before you go all the way to Rust. Rust is awesome and for the right use case you should use it, but F# is a lot closer to OCaml than Rust is.
On the other hand, Rust (like Erlang) reinvents and wraps a lot of these calls in ways that are not immediately obvious. (Or at least, not AS immediately obvious as they are in OCaml.)
This is such a tremendous aid because there are nearly limitless documents and examples of the Unix API.
* network effect / hype
One of the main reason. It sounds lame, but I can justify to a customer re-writing a project in Rust because they've heard about it and they will be able to hire people who have either used it or at the very least will be interested in learning it. Also, they network/hype means that we are going to see good libraries emerge fairly quickly.
* multicore support
A year ago I was ready to move to OCaml, bought books, started to learn it, but the multicore situation was worse than python. Today we hear that "there is a good chance it's going to get multicore support". In Rust, it's already there, and it's not and after thought.
Rust also happened to be slightly faster for most things according to micro-benchmarks, but we know how reliable those are, and it's we're not talking order of magnitudes here, although it is early and we can hope it gets even better.
The implementation philosophy of the two languages is pretty different, despite being superficially similar in terms of syntax. Ocaml is pretty predictable -- you can look at code and have a pretty good idea of what kind of code the compiler is going to generate.
Haskell is a lot more opaque. Between laziness and a more rigid type system, ghc can do some pretty crazy code transformations. In general, this is a good thing, but it can make performance questions harder to figure out.
I think that Ocaml is easier to learn, but Haskell is more fun, and I've learned more from using it.
The can't-reuse-field-names thing is annoying, but claiming that it "isn't a serious language" because they made a design choice that doesn't meet your exact expectations seems kind of closed-minded to me.
That said, working with modules is a lot nicer in OCaml than in Haskell, so it's a less painful solution.
And what poorly thought out semantics?
My understanding is that GC is hard with multithread, particularly in a functional language where it's going to do some heavy lifting and needs to be very performant.
Or were you referring to OCaml in particular?
Also, if you can rely on the data-structures stored in your heap to be persistent, then you can tune the GC for it. The problem is that you need to make assumptions about the life-cycle of those data-structures. For example, the persistent data-structures being used in Scala or Clojure can be pretty heavy for the JVM's garbage collectors because they tend to produce junk that is neither short-term or long-term, thus invalidating the assumptions with which the JVM was built with. And generally that's OK, because the JVM's GCs can cope pretty well and if the need to optimize arises, well both Scala and Clojure are hybrids (just like OCaml), so you can just use mutable stuff if by profiling you see problems. So the theory is known and a decent concurrent GC can be built.
in my original post in r/ocaml: https://www.reddit.com/r/ocaml/comments/36ninh/403_scheduled...
in the repost in r/programming: https://www.reddit.com/r/programming/comments/36ppx0/ocaml_4...
You can make OS level threads, but they can't be both running at the same time due to the GIL (Global Interpreter Lock). Then why are they even there you might ask? Because it allows you to do a blocking call on a thread and to keep executing other stuff in the main thread. Other languages that have a GIL (and the same restriction) are Javascript (including Node.js), Ruby and Python.
Now, IN PRACTICE, things are a bit different. You're never gonna make your own thread to block on things. You're gonna use Lwt to manage all your concurrency so you can do tons of blocking stuff at the same time and combine the tasks nicely without ending up in a Node.js-style "callback hell".
But still, even with tons of concurrency, you don't have parallelism. It's all you need for 98% of your programs, but if you then need to do heavy number-crunching it won't be enough. This is the exact same situation that happens in Node.js, Python, etc, except that OCaml is massively faster than those languages, so even some CPU-bound work is acceptable because OCaml is really performant.
Currently, there's 2 options if you wanna do CPU-bound work: you can use ctypes to call C code easily (from Lwt_preemptive) and then release the lock from within C with caml_release_runtime_system(), so your C code will be truly parallel (and running in the thread pool automatically managed by Lwt_preemptive), and you can call caml_acquire_runtime_system() before returning the result back to OCaml to get the lock back and merge back with the normal code.
The second option is to do an oldschool fork() and communicate with message-passing. Or have a master that manages workers and communicates with ZMQ, HTTP, TCP, IPC, etc. Or use a library that does it all for you like parmap, Async Parallel, etc etc.
What this "multicore support" means is that you'll be able to have threads in the same process that run in parallel because the GIL is going away. In practice it'll probably be implemented directly into Lwt so you'll be able to do something with Lwt_preemptive and just tell it to run some function in a separate thread and then use >>= to handle its result. It's gonna be simpler than both options I described above.
Again, more technical information is available in my r/ocaml post
I work on the Hack language typechecker at Facebook. The typechecker is written in OCaml, and since it needs to operate on the scale of Facebook's codebase (tens of millions of lines of code), it's a pretty performance-sensitive program. We needed real parallelism, but doing it with fork() and IPC was too costly for us, both in terms of storage (if you aren't careful you end up duplicating a bunch of data) and CPU (serializing/deserializing OCaml data structures to send over IPC is CPU-intensive).
We ended up doing something somewhat more interesting. Before we fork(), we mmap a MAP_ANON|MAP_SHARED region of memory -- that region will be backed by the same physical frames in each child after we fork, so writes to it in one child process will be visible in the others. We use a little bit of C code to safely manage the shared-memory concurrency here.
The code for this all open source (along with the rest of the typechecker, HHVM runtime, etc) if you want to take a look: https://github.com/facebook/hhvm/blob/master/hphp/hack/src/h...
I also gave a tech talk a while ago on internals of the type system and typechecker; the latter part starts here: https://www.youtube.com/watch?v=aN22-V-b8RM&feature=youtu.be...
Not quite true; JS just doesn't support threads at all. It's asynchronous and single-threaded. In node.js's case, an event loop uses a system call like epoll or kqueue to wait for many events at a time, and dispatches those events to the correct callbacks.
You can do parallelism in JS with Web Workers, and they do use native OS threads, but they lack shared memory, and can only communicate using message passing. So from the perspective of the JS code, they behave more like processes than threads. No GIL, in any case.
The numerical benchmark table in http://julialang.org/ suggests that JavaScript is quite a number crunching beast, within 2x-3x of C performance.
I have some experience with ML, Haskell & Lisp. OCaml is appealing because it is quite efficient and predictable. Does it have the bit of laziness Clojure has that makes functional programming easy with large data?
It is pretty stupid for a language not to have multicore support in 2015. Javascript has it (in its own, somewhat broken way).
No it doesn't. Also OCaml is from a pre-multicore era. Even Erlang wasn't multicore from the start, SMP was added in 2005.
I don't understand why some projects have such bizarre versioning methodology.
So OCaml 4.03.0 is basically 4.3.0 in a Python-esque versioning scheme (remember how many changes were done between Python 2.2.0 and 2.7.0?).
http://kcsrk.info/ocaml/multicore/2015/05/20/effects-multico...
The core idea is incredibly exciting (to us, anyway). Rather than baking in a specific multicore scheduler, we're allowing pluggable schedulers written in OCaml. They use algebraic effects to allow an independent scheduler to compose concurrency among OCaml threads. This will ensure that the OCaml runtime remains lean, and even allow applications to define their own strategies for concurrent scheduling.