The data structure that drives the immutable variables, the Hash-Array Mapped Trie, is efficient in terms of time- and space- complexity, but it's incredibly CPU cache-unfriendly because by nature it's fragmented - it's not contiguous in RAM. Any operation on a HAMT will be steeped in cache misses, which slows you down by a factor of hundreds to thousands. The longer the trie has been around, the more fragmented it is likely to become. Looping over a HAMT is slower than looping over an array by two or three orders of magnitude.
I don't think there is really a solution to that. It's inherent. Clojure will always be slow, because it's not cache friendly. The architecture of your computer means you physically can't speed it up without sacrificing immutability.
How does Clojure's HAMT avoid fragmenting over time?
> Clojure standard library works on arrays just fine, as demonstrated in the article.
Right, but then you don't have immutability - so you lose all the guarantees that you originally had with immutable-by-default.
Are there any other key/value data structures where insertion and retrieval are less than O(n) in complexity, but where the memory layout is better ordered for cache hits during searches? Maybe good old red-black trees?
Worth checking out are BTrees and Hitchhiker trees, but I think a definitive answer will be implementation dependent even in those cases, i.e. one might win out over the other for a specific branch factor or other tune-able parameters
However, not every use case can benefit from cache optimization and you can use other data structures. It’s not super useful to make generalizations about performance that way.
Hope some performance experts can chime in but this might be related to data structure sharing between caches or allocation contention in the JVM.
For a HAMT, With depth (eg) 2, you should be able to prefetch the next node in your iteration while you iterate the current node of 64 elements. Actually iterating through the objects is going to suck but hopefully you can prefetch enough of them too. (Or maybe hotspot could online things enough to improve memory access patterns a bit to take advantage of ILP). There’s still the problem that you can’t read main memory sequentially except that often all the objects in a HAMT were allocated sequentially so you can read main memory sequentially (as allocation is typically just bumping a pointer, allocation-time-locality tends to correspond to address-locality)
> Clojure will always be slow, because it's not cache friendly.
You always have the option to use the java data structures, for the cases this kind of optimization is needed.
Replacing the critical path is an option, but that only works for numpy-style situations where you can cleanly isolate the computational work and detach it from the plumbing. If your app is slow because the inefficiencies have added up across the entire codebase (more common, I would argue), that's not an easy option.
Before you start discussing whether a language is "slow", check if perhaps it is "fast enough".
Apart from trying not to write stupid code, I don't even optimize anymore, because coupled with ridiculously fast servers (bare-metal hardware from Hetzner in my case), it's a waste of time. Things run great, and I'd much rather invest in building complex app features that customers will pay for, and have flexible living code that I can easily rearchitect as needed.
The last time I had to optimize Clojure code was when I wrote a search engine that had to respond to incremental searches in a fairly large E-commerce product databases. But that was also about 12 years ago, so computers were quite a bit slower, and JVMs were not as good as they are today.
I do tend to use transducers quite a bit, because they provide great composability and structure the code nicely, while providing a good performance boost as well (you avoid creating lots of intermediate data structures, which not only saves CPU, but also reduces cache, memory and GC usage) — but I don't worry about performance too much. It's just not a problem that appears on the radar.
Inquiring mind wants to know: how do you deploy your Clojure (Script) webapp on your bare server? Just an "uber war" that you drop there? (bare servers at OVH here btw but I know Hetzner is good too)
The servers are managed using ansible, and I also have a terraform setup for quickly spinning up a development cluster "in the cloud". Ansible takes a list of servers either from a static file or from terraform.
Drawbacks, hmm — I'm not sure, really, there is no silver bullet, and I can't see significant problems with the language itself that another language would magically solve. I'm pretty happy with Clojure and ClojureScript and some advantages are hard to replicate elsewhere (such as business logic code being written only once and then shared between Clojure and ClojureScript).
One thing I noticed is I got spoiled by Rich Hickey solving problems for me, and I'd really like him to continue doing so. For example, transducers and core.async were great, but spec isn't quite finished yet. Also, reporting status and anomalies (as in, status of operations, computations, including errors/exceptions) and passing it through channels or foreign encodings is a problem I had to solve myself. I'm pretty sure Rich could come up with a better solution.
The biggest problems that I encounter everyday are not with the language itself, but with the ecosystem. Many libraries could use more maintenance. And the really big problems aren't language-related at all. For example, RethinkDB shutting down. I also wish I didn't have to write the full FoundationDB integration ("language binding") myself.
From context, it's a language that uses the platform of a host?
IE, JVM languages like Kotlin/Scala/Clojure when Java was the initially intended one? Or CLR languages like VB/F# which are second-class in comparison to C#?
If it is this, the one interesting thing I have to note is that I've found that sometimes Kotlin/Scala will produce better bytecode and performance than what I would have written naively by hand.
Just due to them emitting horror-inducing code/optimizations that is great for performance but never meant for human eyes.
Language built for platform vs language ported-to platform
- Many new languages still take 'Language as platform' approach
- When ported, have platform-on-platform issues
- Memory management, type-system, threading issues
- Library duplication
- If original language based on C, some extension libraries written in C don’t come over
There's a lot of time where the core team says, Java already does this well, just use the one from Java.- assume the sliding window transducer existed in the standard library (there's an open jira for it)
- assume the programmer is not an expert, but is familiar with stuff like boxed maths in Clojure, which isn't hard to work around, well documented [1], and has a known performance impact
- assume the programmer will avoid using `last`
- assume the programmer is comfortable using transducers
In short, that the transducer exists and the programmer has at least 6 months of experience programming in Clojure full time.
Will you consider the resulting implementation complicated, in depth or non idiomatic? There's always the possibility I'm just used to it so it comes naturally to me, but as the title suggests, I consider the result, before dropping down to arrays, way more idiomatic than the initial implementation.
I don't begrudge CL's speed and implementation, it's the readiness with which Clojure is dismissed as being slow with only surface familiarity with it.
I'm a physicist and when checking out CL in the past, it didn't feel like scientific computing was a huge thing in CL. You intrigue me with your post though. Do you have any resources for me to check out?
* unmatched interactive development which could be used for exploration of problems such as dynamical and complex systems
* although it is a very high level language you can perform low level optimizations in common lisp itself
* extremely powerful macro system that allows you to develop in a declarative style to reflect your problem domain. for example you can program in a syntax that is natural to general or special relativity, quantum mechanics, etc.
* fast and free implementations such as SBCL
* the language is standardized so your code will not need to be unnecessarily maintained (this is good for university professors that have repeating long-running courses)
i am not sure what type of references you are asking for but i will list few open source projects that might be interesting to you:
* CLASP - common lisp implementation build on top of LLVM. this project is headed by a computational chemist (prof. schafmeister) [0]
* PetaLisp - for HPC [1]
* MAGICL - Matrix Algebra proGrams In Common Lisp by Rigetti Computing [2]
* Quil - quantum instruction language developed on top of common lisp [3]
* CL-CUDA - use NVIDIA CUDA in Common Lisp programs [4]
* MGL - Common Lisp machine learning library [5]
* Maxima - a computer algebra system developed in common lisp [6]
i think these are plenty to show that there exists serious interest in using common lisp for scientific computing. it is possible that some of the people behind these projects are on HN and maybe they can further expand on this
[0] https://github.com/clasp-developers/clasp
[1] https://github.com/marcoheisig/Petalisp
[2] https://github.com/quil-lang/magicl
[3] https://github.com/quil-lang/quilc
[4] https://github.com/takagi/cl-cuda
[5] https://github.com/melisgl/mgl also https://github.com/melisgl/higgsml
I really like clojure, like, a lot, but I might have taken an opposite than intended result from this blog post. It made me feel like the language has a number of performance footguns that don't necessarily need to exist.
It does feel like a tradeoff that was a bit too harsh. As the post shows, you can work around it... but at the end of the day my code ends up with a ton of (nth 0) and very few (first) calls. It feels a bit ugly.. and it doesn't feel "idiomatic" as in how you learn Clojure from a textbook. Some more discussion of it here: https://clojureverse.org/t/what-do-beginners-struggle-with/5...
I find the threaded stuff much easier to follow. Maybe it's mostly visual and familiarity
And as an interesting extra: https://clojureverse.org/t/x-x-auto-transducifying-thread-ma...
(def array-xf
(comp
(sliding-array 8 1)
(keep (fn [^objects arr]
(when (> 1000 (unchecked-subtract
(long (aget arr 7))
(long (aget arr 0))))
arr)))))
This is a standard feature of Clojure you learn called a transducer pipeline. It describes a pipeline of transformation for transforming a source:(sliding-array 8 1)
This says to iterate 8 elements at a time in a sliding window manner, so you grab element 0 to 7, then 1 to 8, then 2 to 9, etc., and where the type of the windows will be an array.
(keep fn)
This says that for each of those sliding window, keep only the non-nil values returned by fn.
(fn [^objects arr] (when (> 1000 (unchecked-subtract (long (aget arr 7)) (long (aget arr 0)))) arr))
Finally, this is the function used by keep. It receives as input an array of each sliding window of 8 elements, and it says when the value of the last element in it minus the first element in it is greater than 1000 return the sliding window, otherwise return nil.
In effect, the whole thing will thus return only the sliding windows for which the value of the last minus first element in the window is greater than 1000.
(sequence array-xf times-v)
array-xf only describes the computation, it is generic to the source, so you need to apply it to a source, that's what this code does. It says to apply array-xf to times-v and give you a sequence of the result.
For Clojure, we get the wonderful quote "There is a close transducer, partition-all, which has no step arity.".
One of Clojure's biggest weaknesses in practice is that breaking in to those functional structures to figure out where the time is being spent or to debug them is harder than in other languages. This is a natural trade-off of developing a terse and powerful language.
It is notable how much trouble Clojure has had over the years linking a bug back to a line number and a reason. Even with spec.
Not that hard if you use something like YourKit. There's also a quite good Clojure library https://github.com/hugoduncan/criterium .
In my opinion they are a straight-up an optimization and using them definitely reduces code readability for the vast majority of people who would ever read your code (yes, it makes your code less readable even if it sometimes doesn't increase the token count of your code)
Transducers, for example, have a couple of counter-intuitive features that make them harder (for me, at least) to read... the order of "comp" arguments, for example, is reversed from normal usage. We tend to avoid them at my work unless they are truly needed for performance. In a complicated business domain, we struggle much more with building understanding than with slow code, so readability matters quite a bit.
(I leave aside the question of what is "idiomatic," since it seems pretty subjective, and probably depends somewhat on the crowd you run with.)
As several people have written here, Clojure is fast enough without optimization most of the time. (I leave aside the question of startup time -- a very different discussion.) Though I'm curious how to get good performance, most of the time it matters more to me for the code to be easier to understand than for it to run optimally fast, unless the code creates a bottleneck that has some clear impact on usability.
My goal in writing the original article was to play with optimizing Common Lisp as a learning exercise, rather than to throw down the gauntlet and assert that "Clojure is slow." The author's response was to show how to optimize the Clojure, which was very interesting! Ultimately, both languages have their strengths and weaknesses, and I consider myself lucky to be able to write Clojure for pay.
Times are all from my laptop using the same test input and running on JDK 11.
Baseline smt-8: Execution time mean : 2.386019 sec
My first attempt: Execution time mean : 48.977240 ms
(defn smt-8'' [times-vec]
(loop [res (transient []) pointer-1 0 pointer-2 7]
(if-let [end-element (get times-vec pointer-2)]
(let [start-element (get times-vec pointer-1)
time-diff (- end-element start-element)]
(recur
(if (< time-diff 1000)
(conj! res [(subvec times-vec pointer-1 (inc pointer-2))
time-diff])
res)
(inc pointer-1)
(inc pointer-2)))
(persistent! res))))
With performance tweaks: Execution time mean : 23.567174 ms (defn smt-8''' [times-vec]
(binding [*unchecked-math* true]
(loop [res (transient []) pointer-1 (int 0) pointer-2 (int 7)]
(if-let [end-element (get times-vec pointer-2)]
(let [start-element (get times-vec pointer-1)
time-diff (- end-element start-element)]
(recur
(if (< time-diff 1000)
(conj! res [(subvec times-vec pointer-1 (inc pointer-2))
time-diff])
res)
(inc pointer-1)
(inc pointer-2)))
(persistent! res)))))
Less idiomatic as I'm using an array: Execution time mean : 1.678226 ms (defn smt-8'' [^"[J" times-arr]
(binding [*unchecked-math* true]
(loop [res (transient []) pointer-1 (int 0) pointer-2 (int 7)]
(if (< pointer-2 (alength times-arr))
(let [start-element (aget times-arr pointer-1)
end-element (aget times-arr pointer-2)
time-diff (- end-element start-element)]
(recur
(if (< time-diff 1000)
(conj! res [(mapv #(aget times-arr (+ pointer-1 %)) (range 8))
time-diff])
res)
(inc pointer-1)
(inc pointer-2)))
(persistent! res)))))
All my solutions produce exactly the same result where it includes both the list of elements and their time difference.(set! unchecked-math true)
globally, wrapping in a binding like I did does nothing, since it needs to be set at compile time. My measurement were with it set globally.
Immutability has a price and it's a price we're consciously paying. JVM also has a performance cost, but it's well worth paying due to the wins of running on a virtual machine.
For most boiler-plate tasks, especially where I/O is involved, Clojure is fast enough (eg. handling HTTP requests or talking to a database).
The true power of Clojure, in my opinion, is in how it allows you to think about problems; that machines produce side effects by interpreting a data structure, that the program is one transformation of the input data structure to the target data structure for the desired side effect. You don't think in terms of imaginary objects 'doing' things, you think about the input data structure and how the output data structure should look like, such that if fed to a library function, the desired side effect will occur (eg. the text will be printed or the robot arm will rise).
This mode of thinking about problems is closer to how machines actually work on a higher level and amusingly, 'close to the metal' C++, offers abstractions which take the writer into a fairy tale of class hierarchies, methods that 'do' things to other things, inserting, deleting, mutating everything all over the place. Because that's efficient and fast (and dangerous). But it's also focused a lot on telling the machine 'how' to run; while coincidentally making it do 'what' you want it to do. As a wizard who can put that orchestra with swords and knifes in motion without too much blood spilling on the floors, you can't but feel proud to have achieved it.
Data in -> data out. That's Clojure's way of thinking.
It's a lot simpler way of looking at the problem and maybe makes the `->` feel a bit insignificant and maybe not that important in most cases, as long as it produces the correct output.
The biggest performance hit will be choosing the wrong data structure for solving the problem in a functional way. If that's done correctly, Clojure does offer some tools for squeezing a bit more out of performance critical pieces of code (the articles covers some). Those should be a couple of functions in all your code base; for the boilerplate and domain logic or code involving non realtime I/O, it doesn't matter that much.
If performance is a big concern and you end up writing imperative code in Clojure, I think you'll be better off writing the bits in Java or choose a different language, which is better designed for that purpose.