Two comments:
- LLVM IR is actually remarkably stable these days. I was able to rebase Fil-C from llvm 17 to 20 in a single day of work. In other projects I’ve maintained a LLVM pass that worked across multiple llvm versions and it was straightforward to do.
- LICM register pressure is a big issue especially when the source isn’t C or C++. I don’t think the problem here is necessarily licm. It might be that regalloc needs to be taught to rematerialize
It knows how to rematerialize, and has for a long time, but the backend is generally more local/has less visibility than the optimizer. This causes it to struggle to consistently undo bad decisions LICM may have made.
That's very cool, I didn't realize that.
> but the backend is generally more local/has less visibility than the optimizer
I don't really buy that. It's operating on SSA, so it has exactly the same view as LICM in practice (to my knowledge LICM doesn't cross function boundary).
LICM can't possibly know the cost of hoisting. Regalloc does have decent visibility into cost. Hence why this feels like a regalloc remat problem to me
I'm by no means an LLVM expert but my take away from when I played with it a couple of years ago was that it is more like the union of different languages. Every tool and component in the LLVM universe had its own set of rules and requirements for the LLVM IR that it understands. The IR is more like a common vocabulary than a common language.
My bewilderment about LLVM IR not being stable between versions had given way to understanding that this freedom was necessary.
Do you think I misunderstood?
No. Here are two good ways to think about it:
1. It's the C programming language represented as SSA form and with some of the UB in the C spec given a strict definition.
2. It's a low level representation. It's suitable for lowering other languages to. Theoretically, you could lower anything to it since it's Turing-complete. Practically, it's only suitable for lowering sufficiently statically-typed languages to it.
> Every tool and component in the LLVM universe had its own set of rules and requirements for the LLVM IR that it understands.
Definitely not. All of those tools have a shared understanding of what happens when LLVM executes on a particular target and data layout.
The only flexibility is that you're allowed to alter some of the semantics on a per-target and per-datalayout basis. Targets have limited power to change semantics (for example, they cannot change what "add" means). Data layout is its own IR, and that IR has its own semantics - and everything that deals with LLVM IR has to deal with the data layout "IR" and has to understand it the same way.
> My bewilderment about LLVM IR not being stable between versions had given way to understanding that this freedom was necessary.
Not parsing this statement very well, but bottom line: LLVM IR is remarkably stable because of Hyrum's law within the LLVM project's repository. There's a TON of code in LLVM that deals with LLVM IR. So, it's super hard to change even the smallest things about how LLVM IR works or what it means, because any such change would surely break at least one of the many things in the LLVM project's repo.
What would be neat is to expose all right knobs and levers so that frontend writers can benchmark a number of possibilities and choose the right values.
I can understand this is easier said than done of course.
The reason to couple it to regalloc is that you only want to remat if it saves you a spill
I love LLVM though. clang-tidy, ASAN, UBSAN, LSAN, MSAN, and TSAN are AMAZING. If you are coding C and C++ and NOT using clang-tidy, you are doing it wrong.
My biggest problem with LLVM rn is that -fbounds-safety is only available on Xcode/AppleClang and not LLVM Clang. MSAN and LSAN are only available on LLVM and not Xcode/AppleClang. Also Xcode doesn't ship clang-tidy, clang-format, or llvm-symbolizer. It's kind of a mess on macOS rn. I basically rolled my own darwin LLVM for LSAN and clang-tidy support.
The situation on Linux is even weirder. RHEL doesn't ship libcxx, but Fedora does ship it. No distro has libcxx instrumented for MSAN at the moment which means rolling your own.
What would be amazing is if some distro would just ship native LLVM with all the things working out of the box. Fedora is really close right now, but I still have to build compiler-rt manually for MSAN support..
Omarchy could/should do this, nice low-hanging fruit.
@dhh, if you're listening, the other good thing Omarchy could do is support the VFX Reference Platform specs maintained by the ASWF. That would bring in all of the Linux-based VFX software to Omarchy in a clean way.
I mean, Chimera Linux is pretty LLVM native.
If you get “credit” for contributing when you review, maybe people (and even employers, though that is perhaps less likely) would find doing reviews to be more valuable.
Not sure what that looks like; maybe whatever shows up in GitHub is already enough.
I remember part of the selling point of LLVM during its early stage was compilation time being so much faster than GCC.
LLVM started about 15 years after GCC. Considering LLVM is 23 years old already. I wonder if something new again will pop up.
Discussion: https://news.ycombinator.com/item?id=45072481
There are also codegen projects that don't use LLVM IR that are faster like Cranelift: https://github.com/bytecodealliance/wasmtime/tree/main/crane...
Thus what would be the commercial reason to support LLVM's sucessor, especially since the companies that were responsible for LLVM going mainstream, are happy with current C and C++ support, mostly using LLVM for other programming language frontends?
LLVM is actually really really good at what it does (compiling c/c++ code). Not perfect, but good enough that it would take tens of thousands of competent man hours to match it
Build time wasn’t great, but it was tolerable, so long as you reduced link parallelism to squeeze inside the memory constraints.
Is it still possible to compile LLVM on such a machine, or is 8Gb no longer workable at all?
llvm compiles in less than an hour on my old m1 mac in all the build configurations I have tried so far
Rust is also substantially faster to compile than it was a few years ago, so I have some hope for improvements in that area as well.
This certainly varies across different parts of llvm-project. In flang, there's very much a "long tail". 80% of its 654K lines are attributed to the 17 contributors responsible for 1% or more of them, according to "git blame", out of 355 total.
LLVM of course has plenty of contributors that only ever landed one change, but the thing that matters for project health is that that the group of "top contributors" is fairly large.
(And yes, this does differ by subproject, e.g. lld is an example of a subproject where one contributor is more active than everyone else combined.)
Part of the reason I'm not ready to go all in on Rust is that I'm not willing to externalize that much complexity in the programs I make.
It is used in the Hare language
Among the people who are familiar with such things, yes. An RFC on the topic will be posted in the near future.
We miss you!
On top of that, there is little incentive for contributors to invest in the C API: most LLVM users and developers interact with the C++ API directly, so new features and options tend to be added there first, and often exclusively. As a result, the C API inevitably lags behind and remains a second-class citizen.
Optimizing compilers are basically impossible to audit, but there are tools like alive2 for checking them.
That would require the LLVM devs to be stupid and/or evil. As that is not the case, your supposition is not true either. They might be willing to accept churn in the service of other goals, but they don't have churn as a goal unto itself.
For starters the tooling would be much slower if it required LLVM.