LLVM for Grad Students (opens in new tab)

(adriansampson.net)

439 pointssamps10y ago50 comments

50 comments

42 comments · 14 top-level

regehr10y ago· 8 in thread

Adrian might have mentioned an important drawback of using LLVM: active effort is required to keep up with it. Many an LLVM-based research project has gotten stuck on 2.9 or 3.2 at which point it starts becoming less and less relevant.

It's not that big of a deal, but active effort is required. The amount of effort depends on how many and which APIs your project uses; for a small/medium project perhaps a couple of hours every couple of weeks.

sampsOP10y ago

Great point, John. Can I have a comments section on my site where only you are allowed to comment?

regehr10y ago

Sure :).

emeryberger10y ago

Glad you said this - it's a pain point, for sure. My PhD student Charlie Curtsinger (who will be joining Grinnell College this Fall) developed Stabilizer using LLVM (see http://emeryberger.com/research/stabilizer/, http://www.cs.umass.edu/~emery/pubs/stabilizer-asplos13.pdf), and it is "stuck" for now in exactly the way you describe (he plans to fix it soon, but it will take a solid week or two). Of course, YMMV: Stabilizer is by its nature pretty invasive -- it randomizes code and stack frames dynamically during execution (in addition to doing fine-grained heap randomization), and this touches a lot of stuff.

ufo10y ago

I often hear about llvm breaking backwards compatibility but I don't know much about the specifics. What are the parts that are more prone to change? Are there any parts that are more stable?

mattgrice10y ago

This seems like a strange observation. Active effort is required to keep up with any compiler framework, except the ones that aren't going anywhere. LLVM and Clang at least attempt to provide fixed interfaces for interfacing. In contrast The GCC maintainers are actively hostile towards that idea.

Honestly, keeping up seems like less of a concern for research projects. Most projects will be abandoned after providing grist for a few papers. The most useful and practical will get integrated back into LLVM. The ones that are not yet practical but of ongoing interest will have multiple groups working on them. A good example of this is Polly.

regehr10y ago

Many people have observed the LLVM appears to move unusually quickly. This is good except when it's bad.

ckok10y ago

Is there an alternative to llvm without this drawback (specifically for codegeneration)?

sklogic10y ago

I found that sticking to llvm-c interface makes things a bit better.

amelius10y ago· 8 in thread

How difficult would it be to add a garbage collector to a language that you have written a compiler for?

maximilianburke10y ago

One of my work projects involved building a .NET runtime with LLVM. It evolved from conservative garbage collection through to an advanced precise collector supporting a multithreaded runtime.

The conservative collector required no support from the runtime as they operate by treating every word-sized value on the stack and heap as a potential reference. All you need for a conservative collector is to be able to know where your stack starts and ends, where your heap begins and ends, and where your globals are.

We went through a couple designs for precise collectors and they all used the same meta-information. A precise collector needs to know what is a reference and what is not so your compiler needs to emit stack maps (tables of offsets where, given your PC in a function, all the live objects are), global root information, as well as object layout information.

Sometimes this becomes more complex. Object layout information included both location of references within type instances as well as whether or not an object is an array. Arrays of value types require both.

.NET also has a type of reference, a managed pointer, which can be used to refer to the middle of an object, such as a pointer to a value type in the middle of an array, which contributes to the parent object's liveliness information.

To sum up, conservative garbage collection can be easy to add. Precise garbage collection much less so. Language specifics can throw a big wrench into things.

cfallin10y ago

If you want an imprecise (conservative) garbage collector, the GC and the compiler are almost disjoint. The GC only needs to know where the heap and stack are, and be able to capture register values at GC time. In other words, it's a runtime-library issue, not a compiler issue. (See the Boehm GC for an example that works with a stock C compiler.)

If you want a precise garbage collector, you need type information to know which heap and stack slots and registers are pointers. This is much harder because your compiler needs to emit metadata for each struct and then keep stack frame info or somesuch at the GC safe points so the runtime knows what the thread's roots are.

Most of the precise GCs I know of are in JIT'd or interpreted languages where you have this type info anyway... AOT-compiled Lisp is the one counterexample I can think of, but usually Lisps solve this in a different way by using tagged pointers (so you know if any heap slot is a pointer by e.g. its LSB).

eslaught10y ago

The issue is not having the type info available. AOT compilers certainly have that (or at least, enough of it to get the job done).

The issue is that most compilers with precise GC---AOT or otherwise---are built around that assumption from the very beginning. Therefore, they choose representations, algorithms, and optimizations that are amenable to GC, and in particular don't destroy the type info necessary to make GC work.

It is possible in theory to do the same with LLVM, but difficult in practice. Most existing LLVM-based GC'd languages compromise on performance by requiring LLVM to "root" pointers, effectively forcing them to stay on the stack (instead of registers) so that the GC can see them when it needs to. LLVM's optimizations (particularly at the machine-specific level) were previously allowed to do arbitrary things with register-based pointers, which is obviously bad for a GC if no copies of those pointers exist on the stack. It just takes a long time to take a code base that large and untangle the code from such basic assumptions.

This appears to be changing, as the release nodes for LLVM 3.6 indicated. But it will take time. In the mean time languages on LLVM are either going with conservative GCs, or with precise GCs with the performance degradation noted above (e.g. OCaml), or with no GC (e.g. Rust for the time being).

Source: My talk on precise GC for Rust (from 2012): https://github.com/elliottslaughter/rust-gc-talk

mafribe10y ago

   Most of the precise GCs I know of are in JIT'd or interpreted languages

GC is orthogonal to JITing. OCaml and Haskell are two examples of languages with non-JIT AOT compilers (ocamlopt and ghc, respectively) that do precise GC. I'm sure there are plenty of others.

1 more reply

ufo10y ago

What about all the ML-family functional languages? JIT compilation is a fairly recent innovation and I don't think its accurate to say that garbage collection is restricted to JIT compilers.

1 more reply

kragen10y ago

AOT-compiled Lisp generally retains the type info you need at runtime in one way or another.

A precise garbage collector for a normal-ish implementation of C, assembly, or Forth is potentially very difficult, though.

aidenn010y ago

Even e.g. SBCL which is an AOT lisp runs into the issue if you share the C stack (needed for FFI) with the lisp control stack. On targets with few registers (i.e. not SPARC or Power) it uses a conservative GC that pins any memory pointed to on the control stack, and a precise GC for all heap objects.

_delirium10y ago

Relevant docs: http://llvm.org/docs/GarbageCollection.html

felixangell10y ago· 3 in thread

A nice introduction, I'm currently working on a compiler which uses LLVM for code generation... it was very difficult to get into at first. Especially since I was using C (now Go), so I would have to work through 2/3 layers of language and documentation. (Was just C to C++, now it's Go, to C, to C++).

If you're interested in learning more about LLVM, there are some good open source projects that use it. If you aren't using C++, people have also ported the kaleidoscope tutorial project to Haskell, Rust, C, etc... Additionally, a lot of bigger compilers like Rust, and Clang use it - Swift also uses LLVM, and should be open source soon?

kd0amg10y ago

Working with the OCaml bindings feels similar. A lot of the functions are explained as wrapping some particular function from the C++ code, which is itself only explained in the docs with a link to its source implementation.

slimsag10y ago

Because I'm curious, could we have a link to your Go project if it's open-source? :)

andars10y ago

https://github.com/ark-lang/ark

1 more reply

tmostak10y ago· 3 in thread

This is a great intro into the subject.

We use LLVM to at MapD (http://mapd.com) to compile SQL queries to CPU and GPU machine code - it has given us a major boost over an interpreter based approach. For more see here - http://devblogs.nvidia.com/parallelforall/mapd-massive-throu.... If you have a background in LLVM or compilers in general and are interested in tackling problems like this please reach out at jobs@mapd.com.

raincom10y ago

Daytona(http://www.research.att.com/projects/Daytona/index.html ), mainly developed by Rick Greer in 1990's, does the same thing. They have implemented superset of SQL; and this SQL is compiled into machine code. Daytona in those days used to parse call records for AT&T. They were processing Terabytes of data when the average laptop had a 20GB hard drive. Rick wrote a paper titled "Daytona And The Fourth-Generation Language Cymbal".

You should contact Rick, if you guys need to pick his mind. He is very down to earth man. And he is retired, lives in Morristown, NJ area.

shepardrtc10y ago

That link is down, but I found a whitepaper on it:

http://web2.research.att.com/export/sites/att_labs/projects/...

gct10y ago

Are you guys entertaining remote developers?

johntyree10y ago· 3 in thread

> LLVM is nicely written: its architecture is way more modular than other compilers. Part of the reason for this niceness comes from its original implementor, who is one of us.

What on earth is that supposd to mean?

minimax10y ago

LLVM was started by Chris Lattner while he was a graduate student at UIUC.

couchand10y ago

Presumably a grad student.

easytiger10y ago

a worrying need to create a clan that does not exist

deanstag10y ago· 2 in thread

For the past year, i had two projects which required me to work with LLVM. Because of the scarcity of articles/documentation, i found it really hard to get into it. The API Docs and a few articles by Eli Bendersky(thank you so much!!! ) was all i found useful.

But once i got past the basic hurdles, the llvm project code was so well written, i felt a certain pleasure working with it.

david-given10y ago

I've used LLVM a few times to generate code, and it's pretty easy to use (and moderately stable between revisions). e.g. http://cowlark.com/calculon is a function evaluation language I wrote as a sort of cheap shader language for a ray tracer; the whole thing is 4000 lines of C++ header.

I have also attempted to port LLVM to new CPU architectures. This is a totally different deal --- it's poorly documented, very hard to make progress with, and peculiarly unfinished. (e.g. the pattern matcher language, which is excellent, is weirdly unable to match certain types of pattern, which requires you to write big chunks of C++ which manually look for particular DAG patterns and convert them into other patterns.) Debugging is a pain; get a pattern wrong and it'll just hang as the pattern matcher state machine goes insane, or else produce weird, contorted and incomprehensible error messages.

I would say that it's probably about as painful as gcc, although the pain points are in a very different place. And LLVM actually has people and momentum behind it, while the gcc mailing lists are dead quiet.

I'm currently investigating libfirm, which is the open source C compiler nobody's ever heard of. It looks really rather nice, and builds in less time than LLVM takes to run its configure script...

lorenzhs10y ago

libFirm is a very interesting project, I know some of the people who are working on it (we're in the same building). What they're doing is quite cool.

PSeitz10y ago· 1 in thread

"backed by the largest company on Earth."

Wal-Mart?

_delirium10y ago

By market cap (or enterprise value), Apple's the largest public company now, worth about $700b, which is about 2x as much as the next biggest (Exxon, Microsoft, etc.). Although Saudi Aramco is probably the world's most valuable company if non-publicly-traded companies are included.

asb10y ago

This is (in my humble opinion) a fantastic introduction, and relevant to a much wider audience than the title implies. If you want to keep up to date with LLVM developments, you might also be interested in my LLVM Weekly newsletter (http://llvmweekly.org/). I try to highlight interesting commits, mailing list discussions, and blog posts (tips and submissions always welcome!).

mtweak10y ago

If you're really interested in this stuff and looking for an opportunity, we're looking for talented LLVM developers for our Austin location.

http://www.bitfusion.io/jobs.php

Mazhar Memon, Bitfusion.io

fish200010y ago

This is indeed a great intro article.

For those interested in seeing examples of LLVM hacking in action, I would recommend reading the source for Halide – https://github.com/halide/Halide – which is an image-processing DSL implemented in C++ and piggybacking on LLVM. I myself learned a lot about LLVM this way.

jnordwick10y ago

Not just for grad students! I wish I would have seen this a week ago. LLVM passes are surprisingly readable too.

valgaze10y ago

This isn't germane to the main topic but this paper they cite about using a compiler pass to verify OS security ("Protecting Applications from Hostile Operating Systems") is pretty darn interesting: http://sva.cs.illinois.edu/pubs/VirtualGhost-ASPLOS-2014.pdf

noreasonw10y ago

Long time ago I read a post by Mathew Flatt about LLVM and gcc and how mini optimization and selling points were important not able to find the post, but it was a interesting read.

dadrian10y ago

This would have been great during my grad school compilers class.

j / k navigate · click thread line to collapse

50 comments

42 comments · 14 top-level

regehr10y ago· 8 in thread

sampsOP10y ago

Great point, John. Can I have a comments section on my site where only you are allowed to comment?

regehr10y ago

Sure :).

emeryberger10y ago

ufo10y ago

I often hear about llvm breaking backwards compatibility but I don't know much about the specifics. What are the parts that are more prone to change? Are there any parts that are more stable?

mattgrice10y ago

regehr10y ago

Many people have observed the LLVM appears to move unusually quickly. This is good except when it's bad.

ckok10y ago

Is there an alternative to llvm without this drawback (specifically for codegeneration)?

sklogic10y ago

I found that sticking to llvm-c interface makes things a bit better.

amelius10y ago· 8 in thread

How difficult would it be to add a garbage collector to a language that you have written a compiler for?

maximilianburke10y ago

One of my work projects involved building a .NET runtime with LLVM. It evolved from conservative garbage collection through to an advanced precise collector supporting a multithreaded runtime.

To sum up, conservative garbage collection can be easy to add. Precise garbage collection much less so. Language specifics can throw a big wrench into things.

cfallin10y ago

eslaught10y ago

The issue is not having the type info available. AOT compilers certainly have that (or at least, enough of it to get the job done).

Source: My talk on precise GC for Rust (from 2012): https://github.com/elliottslaughter/rust-gc-talk

mafribe10y ago

   Most of the precise GCs I know of are in JIT'd or interpreted languages

GC is orthogonal to JITing. OCaml and Haskell are two examples of languages with non-JIT AOT compilers (ocamlopt and ghc, respectively) that do precise GC. I'm sure there are plenty of others.

1 more reply

ufo10y ago

What about all the ML-family functional languages? JIT compilation is a fairly recent innovation and I don't think its accurate to say that garbage collection is restricted to JIT compilers.

1 more reply

kragen10y ago

AOT-compiled Lisp generally retains the type info you need at runtime in one way or another.

A precise garbage collector for a normal-ish implementation of C, assembly, or Forth is potentially very difficult, though.

aidenn010y ago

_delirium10y ago

Relevant docs: http://llvm.org/docs/GarbageCollection.html

felixangell10y ago· 3 in thread

kd0amg10y ago

slimsag10y ago

Because I'm curious, could we have a link to your Go project if it's open-source? :)

andars10y ago

https://github.com/ark-lang/ark

1 more reply

tmostak10y ago· 3 in thread

This is a great intro into the subject.

raincom10y ago

You should contact Rick, if you guys need to pick his mind. He is very down to earth man. And he is retired, lives in Morristown, NJ area.

shepardrtc10y ago

That link is down, but I found a whitepaper on it:

http://web2.research.att.com/export/sites/att_labs/projects/...

gct10y ago

Are you guys entertaining remote developers?

johntyree10y ago· 3 in thread

> LLVM is nicely written: its architecture is way more modular than other compilers. Part of the reason for this niceness comes from its original implementor, who is one of us.

What on earth is that supposd to mean?

minimax10y ago

LLVM was started by Chris Lattner while he was a graduate student at UIUC.

couchand10y ago

Presumably a grad student.

easytiger10y ago

a worrying need to create a clan that does not exist

deanstag10y ago· 2 in thread

But once i got past the basic hurdles, the llvm project code was so well written, i felt a certain pleasure working with it.

david-given10y ago

I'm currently investigating libfirm, which is the open source C compiler nobody's ever heard of. It looks really rather nice, and builds in less time than LLVM takes to run its configure script...

lorenzhs10y ago

libFirm is a very interesting project, I know some of the people who are working on it (we're in the same building). What they're doing is quite cool.

PSeitz10y ago· 1 in thread

"backed by the largest company on Earth."

Wal-Mart?

_delirium10y ago

asb10y ago

mtweak10y ago

If you're really interested in this stuff and looking for an opportunity, we're looking for talented LLVM developers for our Austin location.

http://www.bitfusion.io/jobs.php

Mazhar Memon, Bitfusion.io

fish200010y ago

This is indeed a great intro article.

jnordwick10y ago

Not just for grad students! I wish I would have seen this a week ago. LLVM passes are surprisingly readable too.

valgaze10y ago

noreasonw10y ago

Long time ago I read a post by Mathew Flatt about LLVM and gcc and how mini optimization and selling points were important not able to find the post, but it was a interesting read.

dadrian10y ago

This would have been great during my grad school compilers class.

j / k navigate · click thread line to collapse