First of all, there is no enormous complication. Clustering in node is super-easy, and file descriptors of requests are sent automatically to one of the processes in the process pool. This benchmark should have at least done that.
Secondly, the service presented there doesn't even need to take advantage of shared memory concurrency at all. This is the case for a vast majority of web service problems too: they either talk a lot to each other or do a bunch of cpu-intensive work, but rarely both.
Finally, when you use shared memory concurrency/parallelism to solve web service problems, there is a risk that the resources of a single machine will not be enough. And then you are back to serialising things and sending them through an even slower channel.
Haskell also has poor facilities for compute-intensive stuff, although they are different facilities. For example, laziness makes reasoning about performance more difficult. Space leaks are fairly easy to create unless you know the common gotchas. Most naive/idiomatic Haskell code performs several orders of magnitude worse than whats possible with optimized code. Etc etc.