undefined | Better HN

0 pointsweareconvo13y ago0 comments

> Clojure has some very clever immutable data structures based on Phil Bagwell's work [1] that help mitigate the slow down.

This is true, and I've read some of the papers describing said structures and was blown away by the cleverness. However (and I don't have a source here, so please forgive me if I'm just totally wrong) I have watched many of Rich Hickey's hour+ long talks, and in one of them, I distinctly remember him saying that his vectors were "Within an order of magnitude of mutable vectors performance-wise, which is good enough for me".

I was... put off by that.

EDIT: Now that I think about it, it might have been Maps that he was talking about. I are not good at remember.

0 comments

13 comments · 4 top-level

aredington13y ago· 6 in thread

Clojure's immutable data structures allow you to scale out across many cores without needing as rigorous locking semantics as a destructively modified version of the same work would need; copy on write presents no contention until you want the new value to be the canonical value.

Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

Single threaded execution is a computational dead end. If you want to go faster, you have to parallelize, be it on a single system or on a cloud service. Clojure's persistent data structures ease this. That the persistent data structures also have canonical serializations ALSO ease this.

jbooth13y ago

Give me a break, we're talking about vectors here.

I like Clojure, spent some time messing around with it a couple years ago and will one day actually use it for something, probably involving complex configuration where code-as-data really shines along with concurrency/performance.

But if you're talking about working a ho-hum vector with 100-10k entries, a linear scan over a mutable, contiguous array will typically be faster than the most clever multithreaded code you can come up with, and take up less of the CPU while it's working. 10 cores are a Bad Idea for that kind of work.

Amdahl's law tells us we should look at larger units of concurrency in our architecture rather than getting all excited about some auto-parallelized map function. At that point, it starts being important how fast the (single-threaded!) individual tasks run.

rossjudson13y ago

Well, no. A linear scan over a large memory array is going to crap all over the CPU caches if you have to do it more than once.

Break into blocks < CPU cache size, perform multiple stages on each block.

Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth.

bromang13y ago

Do immutable data structures data structures really let you scale out so easily? I thought that was something of a myth...

aredington13y ago

"so easily" is poorly defined, but when you are talking about collection data subject to concurrent modification you have a few options for correctness:

1. Read lock on the data for the time a thread is using it. This ensures that it is not destructively modified while iterating over it. This is a terrible option if you're using any kind of blocking operation during the lifespan of the read lock. The thread who obtains the lock runs quickly without any kind of penalty. After it obtains the lock, which might have taken an extremely long time. Especially if some OTHER thread was blocking with a read lock held.

2. Read lock long enough to copy the data. Work with your copy in isolation. You have to incur the cost of the linear copy, this might or might not be less than the cost of performing the work you actually want to do, but if it's close, your run time just went up 2x.

- Brief caveat: Any explicit locking scheme subjects you to risks from deadlocking. This is where complex multithreaded applications develop explicit locking orders from, and what can make large codebases difficult to work in.

3. Don't read lock, hope for the best. This can work better if you have a means to detect concurrent modification and restart. You might even get the correct behavior for 99% of cases.

4. Work with immutable data structures that cannot be destructively modified out from under you. Immutable data is computationally cheaper to read in the face of parallelism than every other option. It is more expensive to write to. What do your read and write loads look like?

- Also please keep in mind that while Clojure provides immutable persistent data structures out of the box and its reader generates them for all the canonical serializations, it does not take away Java's destructively modifiable types

weareconvoOP13y ago

> Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

And this is, in my opinion, the best possible argument in favor of using Clojure. Off the top of my head, I remember Rich Hickey saying something about how he developed Clojure due to his irritation at working on a huge project that wrote code to handle thousands upon thousands of interconnected nodes. That makes perfect sense.

However... writing a web app with Clojure, at least to me, doesn't.

craigching13y ago

> However... writing a web app with Clojure, at least to me, doesn't.

Why not? I'm serious here, if you're serving thousands of clients in a web app, why wouldn't you want parallelism? I mean, sure at small loads you don't need it, but what about scaling up?

Also, I find using compojure for web app development to be an absolute dream. I might need to get out more (my day job is java web apps), but I love the ability to iterate rapidly in the repl on a web app without having to restart my JVM.

1 more reply

lukev13y ago· 3 in thread

So that was a pretty early talk. I don't have benchmark numbers in hand, but I'm reasonably sure that the performance gap isn't that wide any more.

Also, there's a new(er) feature called transients that can allow you to use mutable data structures where needed for performance without changing the structure of your code: (http://clojure.org/transients).

And you can drop down to mutable Java data structures, if you need to: they are fully supported by Clojure, just not the default.

In general, Clojure's approach is to be thread safe and purely functional by default, but it's not too difficult at all to drop out of that for particular parts of your code you've identified as bottlenecks.

weareconvoOP13y ago

Honestly, one of the things that bothers me the most is how Clojure enthusiasts get hand-wavey when people talk about performance. They will often "prove" its performance characteristics instead of providing solid real-world examples, and will often dismiss other peoples' benchmarks as having something to do with the quality of the Clojure code on which the benchmark is run.

lukev13y ago

Whoa. I didn't dismiss any benchmarks, and I'm happy to back up any of what I've said with real-life experiences.

For example, I was working on a system a few weeks back that involves loading several-hundred GB CSV files. I wrote it in a functional style, with several map and filter operations over each of the rows. I was able to program the logic viewing the entire CSV file as a single lazy sequence to transform.

After profiling I ended up using Java arrays instead of Clojure vectors as the primary data format for records in the system: given the fact that this app was IO-bound, this resulted in about 20% improvement for that particular use case.

However, because Clojure is capable of treating arrays as sequences polymorphically, the change required very few modifications to my code and I was able to maintain a functional style.

lukev13y ago

Also, here's a microbenchmark I just threw together in a REPL, since you ask: The operation is to create a map data structure, then, ten million times, add/assoc a key value. The key is the .toString() of a random integer between 1 and 100,000. The value is the integer. Therefore, the resulting map will end up with a size of 100k elements, and be modified 10m times.

Except for the map implementation used, the code is identical.

I ran each test six times, discard the first three (to allow the JVM to warm up) and took the average result time of the other three.

The results on my 2010 MBP:

java.util.TreeMap: 8573ms

java.util.HashMap: 3243ms

Clojure immutable hashmap: 7909ms

Clojure immutable treemap: 21248ms

Clojure hashmap (using transients): 5113ms

Data!

1 more reply

rayiner13y ago

Mutable vectors are about as fast as it gets (sequential writes to contiguous memory), so within an order of magnitude is still pretty fast. And for the default, catch-all data structure, I'm not convinced that being as fast as possibly possible is ideal. The C++ STL takes this approach, going to great lengths to be as fast as C arrays, but I think it suffers a lot for it in terms of usability.

There is a cultural distinction to be aware of here between Lisp programmers and C/C++ programmers. C/C++ programmers tend to be very "layer oriented." They assume everything will be built on top of built-in constructs like arrays, so they try to make it as fast as possible. But Lisp has historically been more of a "bag of stuff." It offers lists as the default "go-to" data structure, which has a lot of positive qualities. Mutable arrays are a separate data type that you can use if you need the speed. I think Clojure inherits some of the same thinking. Raw JVM arrays are always there if you need them, but the defaults are optimized for things other than raw speed.

DigitalJack13y ago

I totally understand. One thing to watch out for though is "premature optimization" as I've heard it called here on HN.

It may very well be that you need that order of magnitude, but if you don't then you might be missing out something you might have really enjoyed otherwise.

j / k navigate · click thread line to collapse

0 comments

13 comments · 4 top-level

aredington13y ago· 6 in thread

Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

jbooth13y ago

Give me a break, we're talking about vectors here.

rossjudson13y ago

Well, no. A linear scan over a large memory array is going to crap all over the CPU caches if you have to do it more than once.

Break into blocks < CPU cache size, perform multiple stages on each block.

Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth.

bromang13y ago

Do immutable data structures data structures really let you scale out so easily? I thought that was something of a myth...

aredington13y ago

"so easily" is poorly defined, but when you are talking about collection data subject to concurrent modification you have a few options for correctness:

3. Don't read lock, hope for the best. This can work better if you have a means to detect concurrent modification and restart. You might even get the correct behavior for 99% of cases.

weareconvoOP13y ago

> Would I trade a 1000% performance hit on write ops in exchange for the ability to scale out over 10000% as many cores simply? Every day of the week.

However... writing a web app with Clojure, at least to me, doesn't.

craigching13y ago

> However... writing a web app with Clojure, at least to me, doesn't.

Why not? I'm serious here, if you're serving thousands of clients in a web app, why wouldn't you want parallelism? I mean, sure at small loads you don't need it, but what about scaling up?

1 more reply

lukev13y ago· 3 in thread

So that was a pretty early talk. I don't have benchmark numbers in hand, but I'm reasonably sure that the performance gap isn't that wide any more.

And you can drop down to mutable Java data structures, if you need to: they are fully supported by Clojure, just not the default.

weareconvoOP13y ago

lukev13y ago

Whoa. I didn't dismiss any benchmarks, and I'm happy to back up any of what I've said with real-life experiences.

However, because Clojure is capable of treating arrays as sequences polymorphically, the change required very few modifications to my code and I was able to maintain a functional style.

lukev13y ago

Except for the map implementation used, the code is identical.

I ran each test six times, discard the first three (to allow the JVM to warm up) and took the average result time of the other three.

The results on my 2010 MBP:

java.util.TreeMap: 8573ms

java.util.HashMap: 3243ms

Clojure immutable hashmap: 7909ms

Clojure immutable treemap: 21248ms

Clojure hashmap (using transients): 5113ms

Data!

1 more reply

rayiner13y ago

DigitalJack13y ago

I totally understand. One thing to watch out for though is "premature optimization" as I've heard it called here on HN.

It may very well be that you need that order of magnitude, but if you don't then you might be missing out something you might have really enjoyed otherwise.

j / k navigate · click thread line to collapse