It's not that big of a deal, but active effort is required. The amount of effort depends on how many and which APIs your project uses; for a small/medium project perhaps a couple of hours every couple of weeks.
Honestly, keeping up seems like less of a concern for research projects. Most projects will be abandoned after providing grist for a few papers. The most useful and practical will get integrated back into LLVM. The ones that are not yet practical but of ongoing interest will have multiple groups working on them. A good example of this is Polly.
The conservative collector required no support from the runtime as they operate by treating every word-sized value on the stack and heap as a potential reference. All you need for a conservative collector is to be able to know where your stack starts and ends, where your heap begins and ends, and where your globals are.
We went through a couple designs for precise collectors and they all used the same meta-information. A precise collector needs to know what is a reference and what is not so your compiler needs to emit stack maps (tables of offsets where, given your PC in a function, all the live objects are), global root information, as well as object layout information.
Sometimes this becomes more complex. Object layout information included both location of references within type instances as well as whether or not an object is an array. Arrays of value types require both.
.NET also has a type of reference, a managed pointer, which can be used to refer to the middle of an object, such as a pointer to a value type in the middle of an array, which contributes to the parent object's liveliness information.
To sum up, conservative garbage collection can be easy to add. Precise garbage collection much less so. Language specifics can throw a big wrench into things.
If you want a precise garbage collector, you need type information to know which heap and stack slots and registers are pointers. This is much harder because your compiler needs to emit metadata for each struct and then keep stack frame info or somesuch at the GC safe points so the runtime knows what the thread's roots are.
Most of the precise GCs I know of are in JIT'd or interpreted languages where you have this type info anyway... AOT-compiled Lisp is the one counterexample I can think of, but usually Lisps solve this in a different way by using tagged pointers (so you know if any heap slot is a pointer by e.g. its LSB).
The issue is that most compilers with precise GC---AOT or otherwise---are built around that assumption from the very beginning. Therefore, they choose representations, algorithms, and optimizations that are amenable to GC, and in particular don't destroy the type info necessary to make GC work.
It is possible in theory to do the same with LLVM, but difficult in practice. Most existing LLVM-based GC'd languages compromise on performance by requiring LLVM to "root" pointers, effectively forcing them to stay on the stack (instead of registers) so that the GC can see them when it needs to. LLVM's optimizations (particularly at the machine-specific level) were previously allowed to do arbitrary things with register-based pointers, which is obviously bad for a GC if no copies of those pointers exist on the stack. It just takes a long time to take a code base that large and untangle the code from such basic assumptions.
This appears to be changing, as the release nodes for LLVM 3.6 indicated. But it will take time. In the mean time languages on LLVM are either going with conservative GCs, or with precise GCs with the performance degradation noted above (e.g. OCaml), or with no GC (e.g. Rust for the time being).
Source: My talk on precise GC for Rust (from 2012): https://github.com/elliottslaughter/rust-gc-talk
Most of the precise GCs I know of are in JIT'd or interpreted languages
GC is orthogonal to JITing. OCaml and Haskell are two examples of languages with non-JIT AOT compilers (ocamlopt and ghc, respectively) that do precise GC. I'm sure there are plenty of others.A precise garbage collector for a normal-ish implementation of C, assembly, or Forth is potentially very difficult, though.
If you're interested in learning more about LLVM, there are some good open source projects that use it. If you aren't using C++, people have also ported the kaleidoscope tutorial project to Haskell, Rust, C, etc... Additionally, a lot of bigger compilers like Rust, and Clang use it - Swift also uses LLVM, and should be open source soon?
We use LLVM to at MapD (http://mapd.com) to compile SQL queries to CPU and GPU machine code - it has given us a major boost over an interpreter based approach. For more see here - http://devblogs.nvidia.com/parallelforall/mapd-massive-throu.... If you have a background in LLVM or compilers in general and are interested in tackling problems like this please reach out at jobs@mapd.com.
You should contact Rick, if you guys need to pick his mind. He is very down to earth man. And he is retired, lives in Morristown, NJ area.
http://web2.research.att.com/export/sites/att_labs/projects/...
What on earth is that supposd to mean?
But once i got past the basic hurdles, the llvm project code was so well written, i felt a certain pleasure working with it.
I have also attempted to port LLVM to new CPU architectures. This is a totally different deal --- it's poorly documented, very hard to make progress with, and peculiarly unfinished. (e.g. the pattern matcher language, which is excellent, is weirdly unable to match certain types of pattern, which requires you to write big chunks of C++ which manually look for particular DAG patterns and convert them into other patterns.) Debugging is a pain; get a pattern wrong and it'll just hang as the pattern matcher state machine goes insane, or else produce weird, contorted and incomprehensible error messages.
I would say that it's probably about as painful as gcc, although the pain points are in a very different place. And LLVM actually has people and momentum behind it, while the gcc mailing lists are dead quiet.
I'm currently investigating libfirm, which is the open source C compiler nobody's ever heard of. It looks really rather nice, and builds in less time than LLVM takes to run its configure script...
Wal-Mart?
http://www.bitfusion.io/jobs.php
Mazhar Memon, Bitfusion.io
For those interested in seeing examples of LLVM hacking in action, I would recommend reading the source for Halide – https://github.com/halide/Halide – which is an image-processing DSL implemented in C++ and piggybacking on LLVM. I myself learned a lot about LLVM this way.