This is true, and I've read some of the papers describing said structures and was blown away by the cleverness. However (and I don't have a source here, so please forgive me if I'm just totally wrong) I have watched many of Rich Hickey's hour+ long talks, and in one of them, I distinctly remember him saying that his vectors were "Within an order of magnitude of mutable vectors performance-wise, which is good enough for me".
I was... put off by that.
EDIT: Now that I think about it, it might have been Maps that he was talking about. I are not good at remember.
Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.
Single threaded execution is a computational dead end. If you want to go faster, you have to parallelize, be it on a single system or on a cloud service. Clojure's persistent data structures ease this. That the persistent data structures also have canonical serializations ALSO ease this.
I like Clojure, spent some time messing around with it a couple years ago and will one day actually use it for something, probably involving complex configuration where code-as-data really shines along with concurrency/performance.
But if you're talking about working a ho-hum vector with 100-10k entries, a linear scan over a mutable, contiguous array will typically be faster than the most clever multithreaded code you can come up with, and take up less of the CPU while it's working. 10 cores are a Bad Idea for that kind of work.
Amdahl's law tells us we should look at larger units of concurrency in our architecture rather than getting all excited about some auto-parallelized map function. At that point, it starts being important how fast the (single-threaded!) individual tasks run.
Break into blocks < CPU cache size, perform multiple stages on each block.
Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth.
1. Read lock on the data for the time a thread is using it. This ensures that it is not destructively modified while iterating over it. This is a terrible option if you're using any kind of blocking operation during the lifespan of the read lock. The thread who obtains the lock runs quickly without any kind of penalty. After it obtains the lock, which might have taken an extremely long time. Especially if some OTHER thread was blocking with a read lock held.
2. Read lock long enough to copy the data. Work with your copy in isolation. You have to incur the cost of the linear copy, this might or might not be less than the cost of performing the work you actually want to do, but if it's close, your run time just went up 2x.
- Brief caveat: Any explicit locking scheme subjects you to risks from deadlocking. This is where complex multithreaded applications develop explicit locking orders from, and what can make large codebases difficult to work in.
3. Don't read lock, hope for the best. This can work better if you have a means to detect concurrent modification and restart. You might even get the correct behavior for 99% of cases.
4. Work with immutable data structures that cannot be destructively modified out from under you. Immutable data is computationally cheaper to read in the face of parallelism than every other option. It is more expensive to write to. What do your read and write loads look like?
- Also please keep in mind that while Clojure provides immutable persistent data structures out of the box and its reader generates them for all the canonical serializations, it does not take away Java's destructively modifiable types
And this is, in my opinion, the best possible argument in favor of using Clojure. Off the top of my head, I remember Rich Hickey saying something about how he developed Clojure due to his irritation at working on a huge project that wrote code to handle thousands upon thousands of interconnected nodes. That makes perfect sense.
However... writing a web app with Clojure, at least to me, doesn't.
Why not? I'm serious here, if you're serving thousands of clients in a web app, why wouldn't you want parallelism? I mean, sure at small loads you don't need it, but what about scaling up?
Also, I find using compojure for web app development to be an absolute dream. I might need to get out more (my day job is java web apps), but I love the ability to iterate rapidly in the repl on a web app without having to restart my JVM.
Also, there's a new(er) feature called transients that can allow you to use mutable data structures where needed for performance without changing the structure of your code: (http://clojure.org/transients).
And you can drop down to mutable Java data structures, if you need to: they are fully supported by Clojure, just not the default.
In general, Clojure's approach is to be thread safe and purely functional by default, but it's not too difficult at all to drop out of that for particular parts of your code you've identified as bottlenecks.
For example, I was working on a system a few weeks back that involves loading several-hundred GB CSV files. I wrote it in a functional style, with several map and filter operations over each of the rows. I was able to program the logic viewing the entire CSV file as a single lazy sequence to transform.
After profiling I ended up using Java arrays instead of Clojure vectors as the primary data format for records in the system: given the fact that this app was IO-bound, this resulted in about 20% improvement for that particular use case.
However, because Clojure is capable of treating arrays as sequences polymorphically, the change required very few modifications to my code and I was able to maintain a functional style.
Except for the map implementation used, the code is identical.
I ran each test six times, discard the first three (to allow the JVM to warm up) and took the average result time of the other three.
The results on my 2010 MBP:
java.util.TreeMap: 8573ms
java.util.HashMap: 3243ms
Clojure immutable hashmap: 7909ms
Clojure immutable treemap: 21248ms
Clojure hashmap (using transients): 5113ms
Data!
There is a cultural distinction to be aware of here between Lisp programmers and C/C++ programmers. C/C++ programmers tend to be very "layer oriented." They assume everything will be built on top of built-in constructs like arrays, so they try to make it as fast as possible. But Lisp has historically been more of a "bag of stuff." It offers lists as the default "go-to" data structure, which has a lot of positive qualities. Mutable arrays are a separate data type that you can use if you need the speed. I think Clojure inherits some of the same thinking. Raw JVM arrays are always there if you need them, but the defaults are optimized for things other than raw speed.
It may very well be that you need that order of magnitude, but if you don't then you might be missing out something you might have really enjoyed otherwise.