Making Julia as Fast as C++ (2019) (opens in new tab)

(flow.byu.edu)

86 pointsd_tr1mo ago62 comments

62 comments

43 comments · 13 top-level

StilesCrisis1mo ago· 16 in thread

Punchline: rewrote the code to look almost identical to C++, hand-held the compiler by adding @-marks to disable safety checks, forced SIMD codegen and fastmath on.

End result: code that is uglier and still much slower than C++. Kind of a shame.

celrod1mo ago

I was once a bit of a Julia performance expert, but moved toward c++ for hobby projects even while still using Julia professionally.

I wrote a blog post at the time with exactly that punchline (not explicitly stated, but just look at the code!): https://spmd.org/posts/multithreadedallocations/ The example was similar to a real production-critical hot path from work.

Maybe things changed since I left Julia, but that was December 2023, for years after this blog post.

arbitrandomuser1mo ago

hey , what happened to LoopModels ?

1 more reply

SatvikBeri1mo ago

This is 7 years old. Julia is a totally different language by now.

As a quick anecdote, in our take-home interview exercise, we usually receive answers in C++ or Julia, and the two fastest answers have been in Julia.

HarHarVeryFunny1mo ago

I'd have to guess that this is because of ease of use. C++ lets you get as close to the metal as you choose to, so there is no reason why a C++ solution shouldn't be at least as fast as one written in any other language, and yet ...

Of course it also depends on what additional libaries you are using, especially when it comes to parallel/GPU programming in C++, but easy to believe that Julia out of the box makes it easy to write high performance parallel software.

2 more replies

d_trOP1mo ago

> This is 7 years old.

Yeah, I actually totally forgot to check the date...

neutrinobro1mo ago

Hardly seems worth the effort, perhaps things have improved since 2019. It would be interesting to see an updated benchmark, but if your going to end up with code that looks like C++ to get proper performance, you might as well write it in C++. My biggest problem with Julia is that they decided to use column-major indexing for multi-dimensional arrays (i.e. FORTRAN/MATLAB style). This makes interoperability with C/C++ and python numpy a real pain, since you can't do zero-copy array sharing between the two without one side being forced into strided-access. For that reason alone I haven't adopted it in any of my work-flows.

adrian_b1mo ago

Actually the column-major order of Fortran is more efficient for some linear algebra operations than the order of C, which has been inherited by many modern languages that do not care about high performance in scientific computations.

So I would say that the culprit for interoperability is C and its descendants, not Fortran or Julia. The designers of C and of the languages that have imitated C have not given any thought about which order for multi-dimensional arrays is better, so the users of such languages do not have any right to blame for interoperability other languages that have done the right thing. Even if the Fortran order had not been better, it had already been used for 20 years before C, so there was no reason to choose a different order.

C has chosen to store arrays in the order in which they are typically read by humans when written on paper, but this is a choice like the choice between big-endian and little-endian, where big-endian was how Europeans wrote numbers, but little-endian is more efficient on computers.

An example of why column-major order is preferable, is the matrix-vector product, i.e. the evaluation of a function that maps linear spaces.

The matrix-vector product should not be done as it is typically taught in schools, by scalar products of rows of the matrix with the vector, because this is less efficient, by making more memory accesses. The right way to compute a matrix-vector product is by doing AXPY operations between columns of the matrix and the vector operand (segments of the output of the AXPY operations are held in registers until all partial AXPY operations are accumulated, avoiding memory accesses). In this case, you need to read columns of the input matrix for each AXPY operation, which is much more efficient when the elements of a column are stored compactly in memory, avoiding the need of strided accesses.

The same thing happens for matrix-matrix products, which must not be done in the naive way taught in schools, by scalar products of rows of the first matrix with columns of the second matrix, but it must be done by tensor products of columns of the first matrix with rows of the second matrix.

1 more reply

csvance1mo ago

Just reverse the axis on one side, typically the Julia side. This is the convention used in Lux.jl/Flux.jl. I share memory between the two with zero additional copying for my workflows on a daily basis. If you are really allergic to doing this, I’m sure it’s possible to use metaprogramming / the type system to write it the same way in both places with zero performance overhead.

brabel1mo ago

> code that is uglier and still much slower than C++.

Oh such a shame indeed! They didn’t even manage to produce better looking code at least?? Julia was looking great in 2019 but it was very buggy still so I stopped looking. Had hopes that by now it would be a good choice over C++ and Rust with similar performance.

cmrdporcupine1mo ago

There's simply no way it'd ever have similar performance to those. It's not possible.

I have always seen it as a potential alternative to Java, and definitely better than Python.

My experience working in it professionally was that it was... fine. But the GC in it was not good under load and not competitive with Java's.

2 more replies

drnick11mo ago

Came here to say that. It's just easier to write C++ in the first place, and LLMs now make this easier than ever.

2ndorderthought1mo ago

I don't get the appeal. It's like a. OSS Matlab but all contributions are used directly so the language developers can make money for a parent company? Most OSS languages aren't run that way. Seems kind of scammy

KenoFischer1mo ago

It always amuses me when people assume that the nefarious scheme is taking open source contributions and selling them. That's not the nefarious scheme. The nefarious scheme is going to partners, funding agencies and investors and saying "look at this unique capability / important research / profitable business opportunity that we can do together, but oops, all of our code is written in Julia, so I guess we better pay some people to maintain it so it'll all come crashing down, wouldn't want that to happen".

Also, I'm of course using nefarious in jest here in both cases. While we don't directly try to monetize our open source work, I respect that sometimes people need to do that. As long as people are transparent about it, I don't have a problem. Doing the thing we're doing seems to work, but it's a lot harder, because you have to build a successful pice of software and a (or multiple) successful something elses that has a critical dependency on it. It's like hitting the lottery twice.

1 more reply

csvance1mo ago

Your baseline for comparison is a company that doesn't give anything away for free?

Also, contributing in open source is a choice, not a mandate. I greatly benefit from Julia and its ecosystem so I chose to contribute back some of my work, no one forced me. I chose the MIT license because I want other people to be able to make money with it, just like I make money with other peoples MIT licensed stuff.

postflopclarity1mo ago

the parent company is a consumer of Julia, and has no formal role in oversight or governance; they are of course invested in the success and performance of the language, but so are all other users!

1 more reply

andyferris1mo ago

Meh, I’ve never been associated with the company and AFAICT they provide value through platforms for enterprises. Not everyone gets OSS sponsorships to fund team (and using a social media presence to achieve this was a post-Julia phenomenon).

It’s nothing like Google-the-ad-company influencing Chrome. The company consumes Julia for products to sell, rather. Maybe this affects the ordering of features landing, but… meh.

kmaitreys1mo ago· 10 in thread

I really like Julia as a language but I have struggled to adopt it and be productive in it. Part of it is because of the JIT runtime and a sub-par LSP (at least when I last tried).

To those who regularly write Julia code, what is your workflow? The whole thing with Revise.jl did not suit me honestly. I have enjoyed programming in Rust orders of magnitude more because there's no run time and you can do AOT. My intention is not write scripts, but high performance numerical/scientific code, and with Julia's JIT-based design, rapid iteration (to me at least) feels slower than Rust (!).

jakobnissen1mo ago

The boring answer is that I don’t use huge dependencies that takes minutes to compile, and I don’t lean on the LSP - I tend to put more effort in reading the code.

In my experience you really gotta work with the tools the language gives you. Julia gives you Revise, so it’s a bit of a handicap not using it. Maybe analogous to writing Rust without an LSP.

I get that leaning on the LSP can become a habit, and also that the Julia LSP is quite poor, but I find it wild that rapid iteration for you is faster in Rust. I write Rust as well and can’t imagine how that would be the case.

kmaitreys1mo ago

A lot of people have focussed on the LSP in their replies when it is was only one of the problems I mentioned.

rust-analyzer is a great LSP and paired with clippy it can teach you the language itself. Also, writing numerical code is extremely easy in Rust. I can write code and just run cargo run to see the output. Julia, on the other hand, forced a REPL-based workflow which never has made sense to me. REPL-based workflow makes sense when you just want to do some script stuff. But when writing a code which will run for a long duration on a HPC? I don't get it. Part of the problem is I'm not "holding it correctly", but again, out of the box experience isn't good. You define a struct and later add or remove a field from it. Often you'll get an error because Revise.jl didn't recompile things. It was a sub-par experience and I was hoping to people would share their dev workflow in more detail

1 more reply

SatvikBeri1mo ago

Well, my workflow uses Revise.jl. I develop either in Jupyter notebooks or in the REPL, prototyping code there and then moving functions to files when they're ready. In that context, rapid iteration is fairly fast.

Nowadays I often use Claude Code, working with a Julia REPL in a tmux or zellij session via send-keys. I'll have it prototype and try to optimize an algorithm there, then create a notebook to "present its results", then I'll take the bits I like and add them to the production codebase.

kmaitreys1mo ago

How do you develop a program which will run for longer duration on HPCs. How do you quickly modify struct definitations, how do you define imports (using vs include syntax is so confusing!)

REPL-based workflow doesn't make sense to me other than scripting work.

2 more replies

arbitrandomuser1mo ago

yup the LSP is bad, there is a new lsp being rewritten based on JET.jl a static code analyzer , this should be faster than the old lsp which kind of runs by loading all the modules into a julia instance and queries it for symbols and docs ( im not 100% sure but i think thats how it works)

thimotedupuch1mo ago

Exactly ! The new LSP is getting ready https://github.com/aviatesk/JETLS.jl/ with one of the compiler devs working hard on it. I tried it with VSCode, Zed and Helix and it's more than fine already.

I hope julia developper tools will one day match the best of what other programming languages have to offer.

tombert1mo ago

Just an FYI...Claude is actually really good at building LSP servers [1].

If you want a better Julia LSP, you might just be able to get Claude or Codex to build one for you. I've been impressed with the TLA+ bindings it generated.

[1] https://github.com/Tombert/TLA-Language-Server-Protocol

paddim81mo ago

What's the problem with the JIT runtime? Why is rapid iteration slower with JIT? Just-in-time compilation isn't inherently slower and is normally faster than AOT for dynamic languages and even static languages that have some dynamic features like dynamic dispatch

lelanthran1mo ago

> Part of it is because of the JIT runtime and a sub-par LSP (at least when I last tried)

Good LSPs do the autocompletion, sub par ones don't.

Is it really such a good idea to have every single automated aid turned on when picking up a new language?

How will you learn if you cannot get feedback on what you did wrong?

I mean, until you learn multiplication, maybe don't use the calculator.

Once you learn it then you get a small speed increase, but if you are new to something, LSP autocompletion is going to slow down your learning.

kmaitreys1mo ago

I think LSPs like rust-analyzer are very good tools to learn the language itself. I think I learnt Rust solely through LSP and clippy.

FattiMei1mo ago· 2 in thread

Very interesting post and I think this exposes the limitations of the Julia compiler. Note that an old version of the compiler is used (1.0.3 from 2019).

One could say that we can almost replicate the semantic of a C++ program, but writing in Julia. For example we can remove bounds checks in arrays or remove hidden memory allocations.

But the goal of a language for numerical computing is capturing the mathematical formulas using high level constructs closer to the original representation while compiling to efficient code.

Domain scientists want to play with the math and the formulas, not doing common subexpression elimination in their programs. Just curious to see how it evolves

northzen1mo ago

I think the best compromise would be to get the best of two words. By default perform bound checks, but have a compiler flag which skips it. Might broke many programs written with default behaviour in mind, but allow perform additional optimizations.

postflopclarity1mo ago

this is exactly what julia does. boundschecks are default on, and there are compiler flags --- either locally, via the `@inbounds` macro, or globally with `--check-bounds=no`--- to disable them

ekjhgkejhgk1mo ago· 1 in thread

Phew. 7-year old post about a 10-year old language. Triggers all the LLMs posting empty generic response "Very interesting, exposes limitations...".

Prelude of what's to come in the self-reinforcing cycle of machines talking to machines and drowning everything else.

kelipso1mo ago

It's a very predictable pattern I swear. Thought it was a mostly reddit thing but dead internet theory looking more and more real even here.

mgkuhn1mo ago· 1 in thread

I'm always surprised when people describe Julia syntax as "Pythonic": Julia's syntax was clearly inspired by MATLAB rather than Python.

And that's a good thing, because Python+NumPy syntax is far more cumbersome than either Julia or MATLAB's.

You can see this at a glance from this nice trilingual cheat sheet:

https://cheatsheets.quantecon.org/

SatvikBeri1mo ago

It's definitely closer to matlab than python, but it's closer to python than most mainstream programming languages. I ported ~20k lines of python code to Julia over a couple years manually, and for the most part could do line-by-line translations that worked (but weren't necessarily performant until I profiled and switched to using Julia idioms.)

Syzygies1mo ago

Julia is reasonably fast. I returned to a language comparison project specific to my math research, to see how I might do better. My agents and I studied the advice in the post and various more recent links from the comments, but we were already mostly on target and nothing left moved the needle.

My work is more combinatorial. Julia does excel at numerical computation. There's a tribal divide in math between people who can't go 30 seconds away from the real or complex numbers, and those whose tolerance is about that long. I try to keep an open mind, but I'm closer to the second camp. Julia is good enough to consider either way.

A development in recent months, AI can assist in general purpose Lean 4 programming, no longer getting confused by the dominant proof-oriented training corpus. If one is a functional programmer who believes that Haskell was on the right track, then Lean is the most interesting language choice for shaping one's thoughts. Benchmarks are inherently misleading if a better language makes it possible to express algorithms out of reach of more primitive languages.

https://github.com/Syzygies/Compare

            C++  100    13.08s  ±0.08s
           Rust   99    13.16s  ±0.02s
          Julia   90    14.54s  ±0.01s
             F#   90    14.54s  ±0.04s
  Kotlin-native   88    14.79s  ±0.01s
         Kotlin   86    15.18s  ±0.01s
          Scala   79    16.50s  ±0.08s
   Scala-native   76    17.14s  ±0.02s
            Nim   65    20.17s  ±0.01s
          Swift   64    20.54s  ±0.04s
          Ocaml   52    25.38s  ±0.04s
           Chez   49    26.64s  ±0.02s
        Haskell   37    34.96s  ±0.06s
           Lean   29    45.39s  ±0.15s

ForceBru1mo ago

Recent discussion on Julia Discourse: https://discourse.julialang.org/t/making-julia-as-fast-as-c/

mgkuhn1mo ago

Note that this article is about Julia 1.0.3, whereas today you should consider as obsolete any experience reports involving Julia versions prior to Julia 1.10 (the current LTS version), the most significant milestone in the maturity and usability of the language.

orthogonal_cube1mo ago

Dang, haven’t read much on Julia as of late. I remember using it for a CS 300-level course around 2016 when learning about tokenizing and parsing as part of language fundamentals. Julia has undoubtedly made some significant performance improvements since then. Would love to see a follow-up that explores what, if anything, from this still holds true and what improvements can be made.

Woodi1mo ago

Over the years there already was almost the identical articles about making in language X program as fast as C or C++... And results was exactly the same: write C/C++ style programs!

Why ?

Because of CPU's architecture - given CPU one just need to structure code in a way CPU can perform efficiently! Is it such surprising that all sugar and multi-functional smartness have cost of all that if's and loops like maps? CPU is just rock stupid and can't do anything else!

That's from where all that specialized instructions are coming and programs just need to be structured or compiled to CPU arch way to perform as fast as CPU and rest of hardware allows...

And there are some "Java machines" and that is exaclty the same story: use CPU native lang :) As much as posible.

So: give us better cpus pls :)

kasperset1mo ago

I wonder how Mojo ranks along with Julia. Mojo was discussed yesterday here. Mojo seems to be more python focused while Julia is very much focused on Scientific computation. I may be wrong.

vivzkestrel1mo ago

- why are all the newer posts on page 1 and page 2 under blog empty? I mean I literally only see the title

- not a single post has anything inside here https://flow.byu.edu/posts/

slwvx1mo ago

From 2019

j / k navigate · click thread line to collapse

62 comments

43 comments · 13 top-level

StilesCrisis1mo ago· 16 in thread

Punchline: rewrote the code to look almost identical to C++, hand-held the compiler by adding @-marks to disable safety checks, forced SIMD codegen and fastmath on.

End result: code that is uglier and still much slower than C++. Kind of a shame.

celrod1mo ago

I was once a bit of a Julia performance expert, but moved toward c++ for hobby projects even while still using Julia professionally.

Maybe things changed since I left Julia, but that was December 2023, for years after this blog post.

arbitrandomuser1mo ago

hey , what happened to LoopModels ?

1 more reply

SatvikBeri1mo ago

This is 7 years old. Julia is a totally different language by now.

As a quick anecdote, in our take-home interview exercise, we usually receive answers in C++ or Julia, and the two fastest answers have been in Julia.

HarHarVeryFunny1mo ago

2 more replies

d_trOP1mo ago

> This is 7 years old.

Yeah, I actually totally forgot to check the date...

neutrinobro1mo ago

adrian_b1mo ago

An example of why column-major order is preferable, is the matrix-vector product, i.e. the evaluation of a function that maps linear spaces.

1 more reply

csvance1mo ago

brabel1mo ago

> code that is uglier and still much slower than C++.

cmrdporcupine1mo ago

There's simply no way it'd ever have similar performance to those. It's not possible.

I have always seen it as a potential alternative to Java, and definitely better than Python.

My experience working in it professionally was that it was... fine. But the GC in it was not good under load and not competitive with Java's.

2 more replies

drnick11mo ago

Came here to say that. It's just easier to write C++ in the first place, and LLMs now make this easier than ever.

2ndorderthought1mo ago

KenoFischer1mo ago

1 more reply

csvance1mo ago

Your baseline for comparison is a company that doesn't give anything away for free?

postflopclarity1mo ago

the parent company is a consumer of Julia, and has no formal role in oversight or governance; they are of course invested in the success and performance of the language, but so are all other users!

1 more reply

andyferris1mo ago

It’s nothing like Google-the-ad-company influencing Chrome. The company consumes Julia for products to sell, rather. Maybe this affects the ordering of features landing, but… meh.

kmaitreys1mo ago· 10 in thread

I really like Julia as a language but I have struggled to adopt it and be productive in it. Part of it is because of the JIT runtime and a sub-par LSP (at least when I last tried).

jakobnissen1mo ago

The boring answer is that I don’t use huge dependencies that takes minutes to compile, and I don’t lean on the LSP - I tend to put more effort in reading the code.

In my experience you really gotta work with the tools the language gives you. Julia gives you Revise, so it’s a bit of a handicap not using it. Maybe analogous to writing Rust without an LSP.

kmaitreys1mo ago

A lot of people have focussed on the LSP in their replies when it is was only one of the problems I mentioned.

1 more reply

SatvikBeri1mo ago

kmaitreys1mo ago

How do you develop a program which will run for longer duration on HPCs. How do you quickly modify struct definitations, how do you define imports (using vs include syntax is so confusing!)

REPL-based workflow doesn't make sense to me other than scripting work.

2 more replies

arbitrandomuser1mo ago

thimotedupuch1mo ago

Exactly ! The new LSP is getting ready https://github.com/aviatesk/JETLS.jl/ with one of the compiler devs working hard on it. I tried it with VSCode, Zed and Helix and it's more than fine already.

I hope julia developper tools will one day match the best of what other programming languages have to offer.

tombert1mo ago

Just an FYI...Claude is actually really good at building LSP servers [1].

If you want a better Julia LSP, you might just be able to get Claude or Codex to build one for you. I've been impressed with the TLA+ bindings it generated.

[1] https://github.com/Tombert/TLA-Language-Server-Protocol

paddim81mo ago

lelanthran1mo ago

> Part of it is because of the JIT runtime and a sub-par LSP (at least when I last tried)

Good LSPs do the autocompletion, sub par ones don't.

Is it really such a good idea to have every single automated aid turned on when picking up a new language?

How will you learn if you cannot get feedback on what you did wrong?

I mean, until you learn multiplication, maybe don't use the calculator.

Once you learn it then you get a small speed increase, but if you are new to something, LSP autocompletion is going to slow down your learning.

kmaitreys1mo ago

I think LSPs like rust-analyzer are very good tools to learn the language itself. I think I learnt Rust solely through LSP and clippy.

FattiMei1mo ago· 2 in thread

Very interesting post and I think this exposes the limitations of the Julia compiler. Note that an old version of the compiler is used (1.0.3 from 2019).

One could say that we can almost replicate the semantic of a C++ program, but writing in Julia. For example we can remove bounds checks in arrays or remove hidden memory allocations.

But the goal of a language for numerical computing is capturing the mathematical formulas using high level constructs closer to the original representation while compiling to efficient code.

Domain scientists want to play with the math and the formulas, not doing common subexpression elimination in their programs. Just curious to see how it evolves

northzen1mo ago

postflopclarity1mo ago

this is exactly what julia does. boundschecks are default on, and there are compiler flags --- either locally, via the `@inbounds` macro, or globally with `--check-bounds=no`--- to disable them

ekjhgkejhgk1mo ago· 1 in thread

Phew. 7-year old post about a 10-year old language. Triggers all the LLMs posting empty generic response "Very interesting, exposes limitations...".

Prelude of what's to come in the self-reinforcing cycle of machines talking to machines and drowning everything else.

kelipso1mo ago

It's a very predictable pattern I swear. Thought it was a mostly reddit thing but dead internet theory looking more and more real even here.

mgkuhn1mo ago· 1 in thread

I'm always surprised when people describe Julia syntax as "Pythonic": Julia's syntax was clearly inspired by MATLAB rather than Python.

And that's a good thing, because Python+NumPy syntax is far more cumbersome than either Julia or MATLAB's.

You can see this at a glance from this nice trilingual cheat sheet:

https://cheatsheets.quantecon.org/

SatvikBeri1mo ago

Syzygies1mo ago

https://github.com/Syzygies/Compare

            C++  100    13.08s  ±0.08s
           Rust   99    13.16s  ±0.02s
          Julia   90    14.54s  ±0.01s
             F#   90    14.54s  ±0.04s
  Kotlin-native   88    14.79s  ±0.01s
         Kotlin   86    15.18s  ±0.01s
          Scala   79    16.50s  ±0.08s
   Scala-native   76    17.14s  ±0.02s
            Nim   65    20.17s  ±0.01s
          Swift   64    20.54s  ±0.04s
          Ocaml   52    25.38s  ±0.04s
           Chez   49    26.64s  ±0.02s
        Haskell   37    34.96s  ±0.06s
           Lean   29    45.39s  ±0.15s

ForceBru1mo ago

Recent discussion on Julia Discourse: https://discourse.julialang.org/t/making-julia-as-fast-as-c/

mgkuhn1mo ago

orthogonal_cube1mo ago

Woodi1mo ago

Over the years there already was almost the identical articles about making in language X program as fast as C or C++... And results was exactly the same: write C/C++ style programs!

Why ?

That's from where all that specialized instructions are coming and programs just need to be structured or compiled to CPU arch way to perform as fast as CPU and rest of hardware allows...

And there are some "Java machines" and that is exaclty the same story: use CPU native lang :) As much as posible.

So: give us better cpus pls :)

kasperset1mo ago

I wonder how Mojo ranks along with Julia. Mojo was discussed yesterday here. Mojo seems to be more python focused while Julia is very much focused on Scientific computation. I may be wrong.

vivzkestrel1mo ago

- why are all the newer posts on page 1 and page 2 under blog empty? I mean I literally only see the title

- not a single post has anything inside here https://flow.byu.edu/posts/

slwvx1mo ago

From 2019

j / k navigate · click thread line to collapse