The Julia implementation ended up being about 2 to 3 times faster. I timed the core learning loops, the network evaluations and gradient calculations and applications, and PyTorch and Julia performed similar here. So it wasn't that Julia was faster at learning. Instead it was all the in-between, all the "book keeping" in Python ended up being much faster in Julia, enough so that overall it was 2 to 3 times faster.
(I was training on a CPU though. Things may be different if you're using a GPU, I don't know.)
Chose Julia instead. Took 4 hours to get everything sorted out (including getting IT to allow Julias package manager to actually download stuff) and have the first model running just putting a paper into code. Since code is just writing the math, this is a vast communication improvement.
After fiddling around withit at home for a week, this was the first professional experience and I'm blown away.
Thanks to everyone who is working on this language!
That requires a linear search through a list of all possible combinations of input arguments.
In a single dispatch language like most object oriented languages, you can do a simple dictionary/hash table lookup. Much faster.
With the JIT Julia is able the optimize away most of these super slow lookups at runtime. Hence you get multiple dispatch for all functions but with fantastic performance. Nobody had done that before.
JIT is not new, multiple dispatch is not new, and multiple dispatch + JIT also isn't new, but nmo existing langauges combined them in a way that allows for the fantastic, efficient devirtualization of generic methods that julia is so good at.
This is why things like addition and multiplication are not generic functions in Common Lisp, it's too slow in CL because the CLOS is not able to efficiently devirtualize the dispatch. In julia, everything is a generic function, and we use this fact to great effect.
CLOS and Dylan laid a ton of important groundwork for these developments, but they're also not the same.
I guess you could try to hack together some kind of templating feature to make new type-specific classes on the fly, but this won’t work well with subtyping. Your template goes system could probably have (matrix float) as a subclass of matrix, but not of (matrix real) or (matrix number). I think you’d lose too much in Common Lisp’s hodge-podge type system.
A big innovation of Julia was figuring out how to make generic functions and multiple dispatch work in a good way with the kind of generic data structures you need for good performance. And this was not a trivial problem at all. Julia’s system let’s you write generic numeric matrix code while still having float matrix multiplication done by LAPACK, which seems desirable.
The other thing is that Julia is a language where generic functions are a low-level thing all over the standard library whereas Common Lisp has a mix of a few generic functions (er, documentation is one; there are more in cltl2), a few “pre-clos” generic functions like mathematical functions, sequence functions and to some extent some array functions, and a whole lot of non-generic functions.
Although CLOS and others do support it, Julia seems to take the cake by most metrics, highlighting that it is a core paradigm of the language, more so than in the others.
I just did a “using Plots” in 1.6.0, and it was fast enough to not care about the delta between Plots and, say, R loading ggplot.
Huge kudos to the Julia team.
julia> @time let
using Plots
plot([sin, cos])
end
11.267558 seconds (17.98 M allocations: 1.114 GiB, 4.83% gc time)
Versus Matlab which probably takes about 15 seconds just to open the editor but plotting is very fast. >> tic
fplot( @(x) [sin(x) cos(x)])
toc
Elapsed time is 0.374394 seconds.
Julia is just about as fast as Matlab after the first run for plotting.[1] https://www.oxinabox.net/2021/02/13/Julia-1.6-what-has-chang...
But I'll sure give it a try because Julia hits a sweet spot between expressiveness and speed (at least for the kind of stuff I do : matrix, algorithms, graphs computations).
I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.
> I know that you shouldn't produce garbage, but I happen to like immutable data structures and those work better with optimised GCs.
If you use immutable data-structures in julia, you're rather unlikely to end up with any heap allocations at all. Unlike Java, Julia is very capable of stack allocating user defined types.
Check out staticcompiler.jl
Or are there steps to produce a binary (much like Go or C or Rust)??
Making a nice small static binary is technically possible using an approach similar to what GPUCompiler.jl does, but the CPU equivalent of that isn’t quite ready for primetime.
not that my suggestion is good, but what they have now is bad
Will there be an M1 Mac version for 1.7?
> `git clone https://github.com/JuliaLang/julia` and `make` should be enough at this point.
https://github.com/JuliaLang/julia/issues/36617#issuecomment...
For the package ecosystem as a whole, we will be slowly increasing the number of third-party packages that are built for aarch64-darwin, but this is a major undertaking, so I don't expect it to be truly "finished" for 3-6 months. This is due to both technical issues (packages may not build cleanly on aarch64-darwin and may need some patching/updating especially since some of our compilers like gfortran are prerelease testing builds, building for aarch64-darwin means that the packages must be marked as compatible with Julia 1.6+ only--due to a limitation in Julia 1.5-, etc...) as well as practical (Our packaging team is primarily volunteers and they only have so much bandwidth to help fix compilation issues).
Julia loses almost consistently to Go, Crystal, Nim, Rust, Kotlin, Python (PyPy, Numpy): https://github.com/kostya/benchmarks
Is this because of bad typing or they didn't use Julia properly in idiomatic manner?
See the docs which kinda read like blog posts: https://juliasimd.github.io/LoopVectorization.jl/stable/
And then replacing the matmul.jl with the following:
@avx for i = 1:m, j = 1:p
z = 0.0
for k = 1:n
z += a[i, k] * b[k, j]
end
out[i, j] = z
end
I get a 4x speedup from 2.72s to 0.63s. And with @avxt (threaded) using 8 threads it goes town to 0.082s on my amd ryzen cpu. (So this is not dispatching to MKL/OpenBLAS/etc). Doing the same in native Python takes 403.781s on this system -- haven't tried the others.A fair comparison with C++ would be to at least include the compilation/linking time into the time reported.
Ditto for Java or any JVM language (you'd have JVM startup cost but that doesn't count the compilation time for bytecode).
Generally, for stuff (scientific computing benchmarks) like this you want to run a lot of computation precisely to avoid stuff like this (i.e you want to fairly allow the cost of compilation & startup amortize)
Optics of this type of reaction is seen everywhere in the Julia community. My advice is to embrace negativity around the language, try to understand if it is fabrication or legitimate, and address the shortcomings.
Julia is a beautiful language and hope some of the warts of the language gets fixed.
* interesting (not fun) to find out how Julia works
* annoying AF to discover that much of the teaching material was hidden behind some 3rd party website, presumably in videos (I didn't bother to register, but started browsing the manual instead). What's wrong with text?
* unnecessarily complex because the documentation for the basic functions is nearly inaccessible to beginners.
But, I managed to get a simple layout system up and running, and it wasn't fast. I rewrote it in Go (the language in which I'm currently working most), and it was literally >100x faster. And that should not be due to the startup costs, because a backtracker shouldn't have that much overhead JIT-ing.
I think I can now say that I can't see the use case for Julia. "Faster than Python" is simply not good enough, and for the rest there are no redeeming features. Perhaps the fabled partial differential equation module is worth it, but that can get ported to other languages, I guess.
Coming from a Python and C++ background, I found it sufficient to just read the docs and do some Advent of Code problems to get productive in Julia. What videos are you talking about? https://docs.julialang.org/en/v1/manual/performance-tips/ I found to be a pretty good document on why and when Julia can be slow.
Julia performance should be on par with Go, if it's slower, read the performance tips in the manual. As for teaching material on 3rd party websites, I don't know what you mean. The Julia manual is available from the julialang.org website.
As for re-writing DifferentialEquations, that is extremely strongly tied to the multiple dispatch paradigm, re-writing it would be hard. What you can get is wrappers like diffeqpy and diffeqr, which call out to Julia.
Here's code I ran, with results:
julia> using GalaxyBrain, BenchmarkTools
julia> bench = bf"""
>++[<+++++++++++++>-]<[[>+>+<<-]>[<+>-]++++++++
[>++++++++<-]>.[-]<<>++++++++++[>++++++++++[>++
++++++++[>++++++++++[>++++++++++[>++++++++++[>+
+++++++++[-]<-]<-]<-]<-]<-]<-]<-]++++++++++."""
julia> @benchmark $(bench)(; output=devnull, memory_size=100)
BenchmarkTools.Trial:
memory estimate: 352 bytes
allocs estimate: 3
--------------
minimum time: 96.706 ms (0.00% GC)
median time: 97.633 ms (0.00% GC)
mean time: 98.347 ms (0.00% GC)
maximum time: 102.814 ms (0.00% GC)
--------------
samples: 51
evals/sample: 1
julia> mandel = bf"(not printing for brevity's sake)"
julia> @benchmark $(mandel)(; output=devnull, memory_size=500)
BenchmarkTools.Trial:
memory estimate: 784 bytes
allocs estimate: 3
--------------
minimum time: 1.006 s (0.00% GC)
median time: 1.009 s (0.00% GC)
mean time: 1.011 s (0.00% GC)
maximum time: 1.022 s (0.00% GC)
--------------
samples: 5 evals/sample: 1
Note that, conservatively, GalaxyBrain is about 8 times faster than C++ on "bench.b" and 13 times faster than C on "mandel.b," with each being the fastest language for the respective benchmarks. In addition, it allocates almost no memory relative to the other programs, which measure memory usage in MiB.You could argue that I might see similar speedup for other languages on my machine, assuming I have a spectacularly fast setup, but this person ran their benchmarks on a tenth generation Intel CPU, whereas mine's an eighth generation Intel CPU:
julia> versioninfo()
Julia Version 1.5.1
Commit 697e782ab8 (2020-08-25 20:08 UTC)
Platform Info: OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
This package is 70 lines of Julia code. You can check it out for yourself here: https://github.com/OTDE/GalaxyBrain.jlI talk about this package in-depth here: https://medium.com/@otde/six-months-with-julia-parse-time-tr...
But note that OP uses larger cells (`int` = 32 bit in the C version, `Int` = 64 bit in the Julia version) while GalaxyBrain seems to use 8 bit cells. Not that I expect this to make a major difference (but perhaps a minor one?)
Second to write very fast julia u need to knew a lot of "tricks" and in most cases u won't be doing it as easy as writing normal code.
And all people writing this benchmark is measuring compilation time (XD?) or not including jitting time they could just look at code/readme for 5s before commenting.
Julia is fast and can be as fast as C but not in all cases and not as easy at it seems.
That's true in literally any language. Some languages require inlined assembly. Others require preprocessor directives. In almost all languages, you need to understand the difference between stack and heap, know how to minimize allocations, know how to minimize dynamic dispatch, know how to efficiently structure cache-friendly memory layouts. And of course, data structures & algorithms 101.
In terms of performance, Julia provides the following:
1. Zero-cost abstractions. And since it has homoiconic macros, users can create their own zero-cost abstractions, e.g. AoS to SoA conversions, auto-vectorization. Managing the complexity-performance trade-off is critical. But you don't see that in micro-benchmarks.
2. Fast iteration speed. Julia is optimized for interactive computing. I can compile any function into its SSA form, LLVM bytecode, or native assembler. And I can inspect this in a Pluto notebook. Optimizing Julia is fun, which is less true in other languages.
https://github.com/JuliaLang/julia/issues/40190
Edit: it's now building:
https://github.com/JuliaLang/docs.julialang.org/runs/2196972...
By the download speed, it might take a few hours before I can plot something.
It also seems that just doing "git clone JuliaRegistries/General.git" is much faster than doing "] add Plots"