There is one thing that I think is important to bear in mind when discussing inlining, especially in the context of Clojure. This is that once a function has been inlined, you can no longer update the definition of that function in the REPL and have that update the behaviour of functions which use it, unless you recompile those as well. This is not a criticism of course, it’s just part of the natural tension between dynamism and performance.
Whenever you call a function, that function and any calls in that call stack occur in a 'fixed world age'. Within a given world-age, method tables and global constants are all fixed, and the langauge can be analyzed like it's statically typed (there are escape hatches like `invoke_in_world`, and `invokelatest`)
Between world-ages, things are allowed to change. When a function calls another function, we add a 'backedge' from the caller to the callee.
So if I have `f(x) = g(h(x))`, and I redefine `h`, we then say it's no longer valid, and then we look at the backedge that leads from `h` to `g` and say the old definition of `g` is also no longer valid, and then we go from `g` to `f` and also invalidate the old definition of `f`.
This means that once `f` is called in a new world age (the world-age gets incremented every time a new method is (re)defined, or if a global const is changed / defined), the compiler knows that it has to recompile `f`, `g`, and `h`. What's especially cool is that this system works regardless of inlining, and it allows us to safely do all sorts of interproceedural optimizations, but in a JIT compiled language.
I'm unsure exactly how jank works WRT this tradeoff, but the article makes it sound like it's closer to the direct linking version, but with the inlining etc being done by jank rather than the JVM. I don't know if this is only for AOT or also in JIT cases.
Right, as you said, you'd have to recompile dependents.
The JVM gets a lot of hate, but that is a very high bar. The JVM is a serious piece of kit. I hope Jank succeeds. I'd love to use it in real projects.
Other JVMs have plenty of goodies, some of them have AOT for about 20 years now, others real time GC, other ones JIT caches before Project Leyden was even an idea, others actual value types as experiment (ObjectLayout on Azul), pauseless GC, cloud based JIT compilers, bare metal deployments, ART also has its goodies somehow despite everything, there is a whole world that is lost when people focus too much on JVM == OpenJDK.
Even clang is now in the process of doing the same.
> We're going to use Clojure JVM to get our baseline benchmark numbers and then we'll aim to beat those numbers with jank.
> Note that all numbers in this post are measured on my five year old x86_64 desktop with an AMD Ryzen Threadripper 2950X on NixOS with OpenJDK 21. When I say "JVM" in this post, I mean OpenJDK 21.
In 2026, a better baseline would be the Java 26 implementations of OpenJDK, OpenJ9, and GraalVM, with JIT cache across several execution runs.
> In the native world, we don't currently have JIT optimization. It could exist, but LLVM doesn't have any implementation for it and neither does any major C or C++ compiler
Yes they kind of have, that is partially what PGO is used for, to get the program behaviour during training runs, and feed it back into the compilation toolchain.
Also while it isn't native code per se, when targeting bytecode environments like IBM i, WebAssembly, CLR, among others, with C or C++, there is certainly the possibility of having a JIT in the picture.
> Finally, just because jank is written in C++ doesn't mean that we can escape Clojure's semantics. Clojure is dynamically typed, garbage collected, and polymorphic as all get out.
Which is why, benchmarks should also take into account compilers for Common Lisp and Scheme compilers.
Anyway, great piece of work, and it was a very interesting post to read, best wishes to the author finding some support.
Basically, the idea is to do as much 'high level' optimization and transformation stuff as you can in your own IR, and then let LLVM handle the low-level stuff and the targeting of specific hardware vendors.
> Clojure's dynamism is granted by a great deal of both polymorphism and indirection, but this means LLVM has very few optimization opportunities when it's dealing with the LLVM IR from jank.
In my mind, what is happening here is you lower Clojure code into LLVM, with a bunch of runtime calls (e.g. your `jank::runtime::dynamic_call`) (e.g. LLVM invoking the runtime over a C ABI).
If that's true, are there any optimizations that LLVM helps out with? Perhaps like DCE? I can't tell immediately, curious about the answer
(question is obviously about the pre-IR state of things)
It also mentions that in Clang the runtime max function will itself be inlined, so that's something LLVM ("the LLVM project", anyway) is still doing - and beyond that, as written this IR is likely to leave behind plenty of opportunities for LLVM to do the things it's good at: DCE, load/store optimisation, constant propagation, etc. And register allocation.
The jank::runtime::max call is itself complex: it's got to type check its arguments and work out what to actually do based on the two types; if parts of these tests are done before the inlined call to max there's a fair chance that LLVM will be able to eliminate their repetition and slim it all down a long way. In the fibonnaci example the fact that a previous test will have likely identified whether the argument is an int or something else should hopefully carry over for ::lte, ::sub, and ::add and simplify those down to just the single operator call - but sadly I suspect it won't at least for the addition, because the recursive call will lose the information that the return value when called with a tagged integer is always a tagged integer.
A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR (:metadata tag functions as specialised for <type> with the new entry point, if a function only calls specalised functions (and itself) it too can be specialised, and a heuristic to determine if specialisation gains enough to sacrifice space for it).
> A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR
All of these math functions are templates with four specific categories:
1. Object and object
2. Primitive and primitive
3. Primitive and object
4. Object and primitive
We handle the difference between typed objects (like integer_ref) and type-erased objects (object_ref) as well. This template then gets inlined, which is exactly what the last step of the benchmark optimizations (adding annotations) ensured. The return type of these functions will prefer primitive types, rather than automatically boxing. jank's analyzer tracks all types used, at compile-time, and supports automatic boxing. This means that we're already using the most optimal primitive math whenever we can and that it will indeed inline to just an operator call when working on two primitives, or two typed objects, or a combination thereof.
You can see the code for this here: https://github.com/jank-lang/jank/blob/29c2adb344526d26c8e82...
https://carp-lang.github.io/carp-docs/LanguageGuide.html
Has anyone been playing with it on HN ?
But to be completely honest, the question: "do you need wasm at all...?", should be always followed by "why?". For like 95% of cases, Clojurescript saves you weeks/months of work. Easier to build, easier to maintain. That's subjective, of course. Most Rustaceans don't even want to try Clojure. Most Clojurists find Rust to be needlessly complex.
LLVM at the IR level has no understanding of the semantics of your language and therefore can't do the kind of optimisations that will really make a big speed difference.
I ended up having two custom IR's for a very high performance compiler I maintain at work. It made a big difference.