I work on the Hack language typechecker at Facebook. The typechecker is written in OCaml, and since it needs to operate on the scale of Facebook's codebase (tens of millions of lines of code), it's a pretty performance-sensitive program. We needed real parallelism, but doing it with fork() and IPC was too costly for us, both in terms of storage (if you aren't careful you end up duplicating a bunch of data) and CPU (serializing/deserializing OCaml data structures to send over IPC is CPU-intensive).
We ended up doing something somewhat more interesting. Before we fork(), we mmap a MAP_ANON|MAP_SHARED region of memory -- that region will be backed by the same physical frames in each child after we fork, so writes to it in one child process will be visible in the others. We use a little bit of C code to safely manage the shared-memory concurrency here.
The code for this all open source (along with the rest of the typechecker, HHVM runtime, etc) if you want to take a look: https://github.com/facebook/hhvm/blob/master/hphp/hack/src/h...
I also gave a tech talk a while ago on internals of the type system and typechecker; the latter part starts here: https://www.youtube.com/watch?v=aN22-V-b8RM&feature=youtu.be...