Rust: Zero-Cost Abstraction in Action (opens in new tab)

(idursun.com)

103 pointsi_dursun6y ago44 comments

44 comments

39 comments · 13 top-level

bdd6y ago· 11 in thread

These are not about abstractions but compile time optimizations. These are also not implemented in the Rust compiler but LLVM. So any any language frontend in front of LLVM would yield the same optimizations. According to Wikipedia the list is:

> [...] variety of front ends: languages with compilers that use LLVM include ActionScript, Ada, C#, Common Lisp, Crystal, CUDA, D, Delphi, Dylan, Fortran, Graphical G Programming Language,Halide, Haskell, Java bytecode, Julia, Kotlin, Lua, Objective-C, OpenGL Shading Language, Ruby, Rust, Scala, Swift, Xojo, and Zig.

Sometimes I think mention of Rust in the title just gets upvotes without even reading the article, here.

devit6y ago

Only those that emit LLVM code that can be optimized like this.

In poorly performing languages like Python and Ruby calling a method on a value (like sum() or even the += operator) essentially means looking up the method name in a hash table, and then doing a dynamic function call on the resulting function pointer, so this sort of optimizations cannot be done in an ahead-of-time compiler unless all objects are created locally and thus have known method dictionaries (and even then, it depends on whether they are available to the compiler in IR form, whether the lookup code is inlined and whether the code can actually be simplified).

zozbot2346y ago

> essentially means looking up the method name in a hash table, and then doing a dynamic function call on the resulting function pointer

Well, you do need something very much like this if you want to support "openly extensible" classes in a backward- and forward-compatible way. Otherwise you end up with something like the C++ ABI hell where changing things around in a base class breaks every binary that links to earlier versions of that class. This is only a "poorly-performing" pattern inasmuch as the developer is not enabled to "opt out" of that extensibility when that's the sensible choice.

jashmatthews6y ago

10+ years ago this was true but CRuby's basic operations like += (bytecode is opt_plus followed by setlocal) don't do method lookup unless you redefine them and even if you do then it uses inline caches in the bytecode so there's only one hash lookup.

TruffleRuby has been constant folding trivial code like this since 2014. AOT is also an option.

habitue6y ago

Actually, so there's a very good chance that this particular optimization is happening due to MIRI, which allows very sophisticated constant evaluation of MIR (a rust intermediate representation)

https://rust-lang.github.io/rustc-guide/miri.html

So while, yes, in general pretty much all rust optimizations are actually due to LLVM, this one in particular might not be

bdd6y ago

Here's the equivalent in C++, compiled with clang 9.0.0 at optimization level 1. https://godbolt.org/z/nKSG9_

Byte for byte identical assembly.

JoeCamel6y ago

I think it's both about abstraction and compile time optimization. The point was that by default you can write code that uses higher level abstraction but compiles to the same code as an imperative-style loop. Most used languages or default implementations don't use LLVM, moreover they are interpreted or JIT compiled, so there might be different performance between e.g. for-loops vs. forEach with a closure.

pjmlp6y ago

Even worse, the lack of proper compiler design courses in many universities or "engineer" bootcamps, means that many attribute to language X, what is actually a compiler backend capability, like you quite well explain.

thcz6y ago

The C# compiler uses LLVM? Is that for something like Xamarin or something? I was under the impression that it was self-contained. Anyone knows the details of this?

wtfleming6y ago

Unity's burst compiler https://docs.unity3d.com/Packages/com.unity.burst@1.2/manual... uses LLVM, but only supports a somewhat limited subset of C#. I've used it a bit in some performance critical parts of a game and saw some dramatic speed increases. But I can't speak to other parts of the C# ecosystem beyond Unity though.

bdd6y ago

Anyone can build a frontend. "Java bytecode" is in that list too. It doesn't mean defacto compilers of these languages rely on LLVM.

1 more reply

pjmlp6y ago

There are many implementations of C#, a few of them use LLVM, like Xamarin AOT, IL2CPP, Burst.

.NET Core and .NET Framework, do not.

tiziano886y ago· 3 in thread

There is nothing about zero const abstractions in this article, just basic compiler optimizations.

kibwen6y ago

The OP doesn't explicitly demonstrate any zero cost abstractions, however these "basic compiler optimizations" are only feasible to achieve statically (i.e. without a JIT) in the presence of sufficiently transparent abstractions, which takes a good deal of language design effort to enable.

microtonal6y ago

Iterators in Rust are (usually) zero-cost abstractions of loops. Bjarne Stroustrup's definition:

What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.

Rust iterators fit in the second part.

Ericson23146y ago

Well, functions are abstractions too, but yes to me the interesting part of "zero const abstraction" is "zero cost data structure composition", and that indeed is not covered.

incadenza6y ago· 3 in thread

Are compiler optimizations like summing a series done on an ad-hoc basis? Certainly the compiler couldn’t have discovered or inferred (not sure what term to use) that formula, no?

Just generally curious.

exacube6y ago

Ad-hoc basis. Usually comes out of studying the most common classes of computation and optimizing for it. For example, the "summing a series" probably falls out of loop optimizations: https://en.wikipedia.org/wiki/Loop_optimization

kibwen6y ago

Those optimizations are happening on LLVM's end (though the higher-level language will have to expose sufficiently transparent abstractions to make these optimizations possible). I'd love to read a book on the optimization techniques that are implemented in LLVM/GCC to make transformations like this possible.

incadenza6y ago

Gotcha. Thanks. So presumably Clang would have done the same for C?

2 more replies

lasagnaphil6y ago· 2 in thread

This isn’t really talking about Rust, it’s actually talking about the optimization capabilities of LLVM (the compiler backend, which quite a lot of languages use, such as Clang for C++, Rust, Swift, Julia, Zig, ...) These languages all have similar chances of performing those same optimizations, at least in those simple cases.

What I’m interested is how the IL code for the compiler frontends for each of those languages are more well-optimizable for LLVM (in practical situations, not just a few lines of simple numerical code.) I’ve heard that you need to be careful about encoding IL code in the right way such that LLVM does not generate needless memcpy’s or something but I’m not that much of a compiler expert...

kibwen6y ago

rustc does indeed deliberately seek to emit LLVM IR that resembles what Clang would emit, in order to benefit from the same sorts of optimizations that default LLVM is tuned for.

kbenson6y ago

It's both, it's just complicated by the fact that LLVM is optimizing it to a known formula.

The zero cost abstractions are the fact that you get the same output, special optimization and all, when you do the original loop, or when you use the "(1..n).fold(0, |x,y| x + y)" variant, or "(1..n).sum()" variant. Rust is converting those to intermediate code that is the same, or close enough, that LLVM is able to apply the same optimization.

Would it be better is some sample code was chosen that wasn't special cased to the degree that the entire algorithm was replaced wholesale? Probably. It doesn't invalidate the premise though, just slightly obscures it.

Ericson23146y ago· 2 in thread

The earlier transformations are actually more impressive. Constant folding is almost always has a good RoI, so lots of compilers do it, and it's simple because, well, it's a constant there is no variables or partial eval needed. The others require lots of inlining before the final rule fires, and so are more ambitious.

Seeing what

  pub fn sum3(n: i32) -> i32 {
     (1..n).sum() + (1..2*n).sum() + (1..(n + 2)).sum()
  }

does would be more interesting to me.

Also, while all the inline is rustc, I assume the "triangle number trick" is LLVM.

SAI_Peregrinus6y ago

Godbolt supports Rust, and can show the LLVM IR: https://godbolt.org/z/GsccW3

pjmlp6y ago

A bit offtopic, I love how Godbolt grew out of C++ community to embrace as much AOT toolchains as possible, kudos to Matt and everyone involved into making it happen.

shmerl6y ago· 2 in thread

> he was very disappointed because Rust version was twice as fast than the C version which was hand-optimised by pulling off all the tricks he knew to make it perform well.

Why disappointed? It just highlights the quality of Rust's approach.

ncmncm6y ago

Evidently the tricks were what made it slow.

This happens all too often: once the code changes enough that the optimizer doesn't recognize the pattern anymore, it throws up its hands, and you're on your own. Some people call this optimizer roulette.

It's not just compilers, either. CPUs have their own peephole optimizers, and patterns they recognize, or don't, and it can easily make a 2x difference in your run time depending on if it cottons to what you're trying, or doesn't.

pjmlp6y ago

With CPUs it gets even worse, because that clever optimized Assembly code can stop being so in another CPU or after a firmware update.

The days of Z80, 6502 and similar are long gone.

univerio6y ago· 1 in thread

I assume it's doing `(N-2)(N-3)/2 + 2N - 3` instead of `N(N-1)/2` due to overflow concerns? But couldn't `(N-2)(N-3)` also possibly overflow, just supporting a larger range of `N`?

devit6y ago

In this assembly code it cannot overflow because N is a 32-bit integer and the multiplication gives a 64-bit result, which is converted to 32-bit only after shifting.

I can't figure out why it doesn't use the simpler formula (other than the optimizer being bad).

pkilgore6y ago· 1 in thread

The ocaml compiler does constant folding too, and I consistently look at the (usually javascript because of Bucklescript) output in amazement when it finds shit like that. Unrolling all my tail recursion is great too.

pjmlp6y ago

For anyone that wants to learn about optimizations in AOT compiled ML languages, have a look at:

"The Implementation of Functional Programming Languages"

"Compiling with Continuations"

"Modern Compiler Implementation in ..." (C, Java and ML variants)

Although oriented towards Lisp, "Lisp in small pieces" is a classical book as well, with many optimizations like inlining of lambda calls across multiple call levels.

yahyaheee6y ago· 1 in thread

This is neat but there is still cost in compile time

JoeCamel6y ago

Usually in Rust community zero cost means zero runtime cost. Obviously, there are many other costs you could define.

SeekingMeaning6y ago

For anyone interested in reading more about this, I would recommend Zero Cost Abstractions[1] by withoutboats, who is a contributor to Rust.

1: https://boats.gitlab.io/blog/post/zero-cost-abstractions/

drej6y ago

This is not Rust specific, this is a compiler thing, both LLVM and GCC can detect sums and generate closed form formulas instead. There are other fun algorithm detections - e.g. if you try to do bitcounts yourself, LLVM will use popcnt instead.

Compilers are awesome, check out Matt Godbolt's talk on this very topic: https://www.youtube.com/watch?v=nAbCKa0FzjQ

ojosilva6y ago

I wonder if the const syntax introduced in ES6 will ever result in const folding, or any JIT optimizations for code running in JS runtimes. Apparently, from what I've read, const is being ignored as far as optimiztions go and is only used as a way to prevent the developer from ever reassigning certain variables.

boomer_joe6y ago

There is nothing impressive about this. https://godbolt.org/z/S2tDDh

j / k navigate · click thread line to collapse

44 comments

39 comments · 13 top-level

bdd6y ago· 11 in thread

Sometimes I think mention of Rust in the title just gets upvotes without even reading the article, here.

devit6y ago

Only those that emit LLVM code that can be optimized like this.

zozbot2346y ago

> essentially means looking up the method name in a hash table, and then doing a dynamic function call on the resulting function pointer

jashmatthews6y ago

TruffleRuby has been constant folding trivial code like this since 2014. AOT is also an option.

habitue6y ago

Actually, so there's a very good chance that this particular optimization is happening due to MIRI, which allows very sophisticated constant evaluation of MIR (a rust intermediate representation)

https://rust-lang.github.io/rustc-guide/miri.html

So while, yes, in general pretty much all rust optimizations are actually due to LLVM, this one in particular might not be

bdd6y ago

Here's the equivalent in C++, compiled with clang 9.0.0 at optimization level 1. https://godbolt.org/z/nKSG9_

Byte for byte identical assembly.

JoeCamel6y ago

pjmlp6y ago

thcz6y ago

The C# compiler uses LLVM? Is that for something like Xamarin or something? I was under the impression that it was self-contained. Anyone knows the details of this?

wtfleming6y ago

bdd6y ago

Anyone can build a frontend. "Java bytecode" is in that list too. It doesn't mean defacto compilers of these languages rely on LLVM.

1 more reply

pjmlp6y ago

There are many implementations of C#, a few of them use LLVM, like Xamarin AOT, IL2CPP, Burst.

.NET Core and .NET Framework, do not.

tiziano886y ago· 3 in thread

There is nothing about zero const abstractions in this article, just basic compiler optimizations.

kibwen6y ago

microtonal6y ago

Iterators in Rust are (usually) zero-cost abstractions of loops. Bjarne Stroustrup's definition:

What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.

Rust iterators fit in the second part.

Ericson23146y ago

Well, functions are abstractions too, but yes to me the interesting part of "zero const abstraction" is "zero cost data structure composition", and that indeed is not covered.

incadenza6y ago· 3 in thread

Are compiler optimizations like summing a series done on an ad-hoc basis? Certainly the compiler couldn’t have discovered or inferred (not sure what term to use) that formula, no?

Just generally curious.

exacube6y ago

kibwen6y ago

incadenza6y ago

Gotcha. Thanks. So presumably Clang would have done the same for C?

2 more replies

lasagnaphil6y ago· 2 in thread

kibwen6y ago

rustc does indeed deliberately seek to emit LLVM IR that resembles what Clang would emit, in order to benefit from the same sorts of optimizations that default LLVM is tuned for.

kbenson6y ago

It's both, it's just complicated by the fact that LLVM is optimizing it to a known formula.

Ericson23146y ago· 2 in thread

Seeing what

  pub fn sum3(n: i32) -> i32 {
     (1..n).sum() + (1..2*n).sum() + (1..(n + 2)).sum()
  }

does would be more interesting to me.

Also, while all the inline is rustc, I assume the "triangle number trick" is LLVM.

SAI_Peregrinus6y ago

Godbolt supports Rust, and can show the LLVM IR: https://godbolt.org/z/GsccW3

pjmlp6y ago

A bit offtopic, I love how Godbolt grew out of C++ community to embrace as much AOT toolchains as possible, kudos to Matt and everyone involved into making it happen.

shmerl6y ago· 2 in thread

> he was very disappointed because Rust version was twice as fast than the C version which was hand-optimised by pulling off all the tricks he knew to make it perform well.

Why disappointed? It just highlights the quality of Rust's approach.

ncmncm6y ago

Evidently the tricks were what made it slow.

pjmlp6y ago

With CPUs it gets even worse, because that clever optimized Assembly code can stop being so in another CPU or after a firmware update.

The days of Z80, 6502 and similar are long gone.

univerio6y ago· 1 in thread

I assume it's doing `(N-2)(N-3)/2 + 2N - 3` instead of `N(N-1)/2` due to overflow concerns? But couldn't `(N-2)(N-3)` also possibly overflow, just supporting a larger range of `N`?

devit6y ago

In this assembly code it cannot overflow because N is a 32-bit integer and the multiplication gives a 64-bit result, which is converted to 32-bit only after shifting.

I can't figure out why it doesn't use the simpler formula (other than the optimizer being bad).

pkilgore6y ago· 1 in thread

pjmlp6y ago

For anyone that wants to learn about optimizations in AOT compiled ML languages, have a look at:

"The Implementation of Functional Programming Languages"

"Compiling with Continuations"

"Modern Compiler Implementation in ..." (C, Java and ML variants)

Although oriented towards Lisp, "Lisp in small pieces" is a classical book as well, with many optimizations like inlining of lambda calls across multiple call levels.

yahyaheee6y ago· 1 in thread

This is neat but there is still cost in compile time

JoeCamel6y ago

Usually in Rust community zero cost means zero runtime cost. Obviously, there are many other costs you could define.

SeekingMeaning6y ago

For anyone interested in reading more about this, I would recommend Zero Cost Abstractions[1] by withoutboats, who is a contributor to Rust.

1: https://boats.gitlab.io/blog/post/zero-cost-abstractions/

drej6y ago

Compilers are awesome, check out Matt Godbolt's talk on this very topic: https://www.youtube.com/watch?v=nAbCKa0FzjQ

ojosilva6y ago

boomer_joe6y ago

There is nothing impressive about this. https://godbolt.org/z/S2tDDh

j / k navigate · click thread line to collapse